MAX models can now run on Apple silicon GPUs
5 points by melodyogonna
5 points by melodyogonna
Dumb question, what exactly is a Max model? Is it yet another llama.cpp thing? In other words, does it just run models? Why is this special? There's a lot of things that run on Apple. Silicon. Wish the website would tell you what it was about!
not dumb, had the exact same question while clicking around confusedly before closing the tab.
What exactly is a Max Model
These are your normal open source models served through the max stack: https://docs.modular.com/max/models/
Is it another Llama.cpp thing?
In short yes, it's a bit of Llama.CPP and vLLM in that it is small enough to spin up and run on your machine, but can scale up to datacenter-scale AI serving.
Why is this special?
Two things:
Something to note is that this stack is entirely self contained. It is the same code base and kernels targeting different hardwares, no Cuda kernels, MLX kernels, or RocM kernels, all Mojo. https://github.com/modular/modular/tree/main/max/kernels
MAX is the inference engine of the AI company Modular, which are better known for the programming language mojo.