omlx: LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

8 points by mpweiher


symgryph

I'm going to try this out. This looks cool!

msangi

MLX is definitely the model variant you want on a Mac. I’ve been using oMLX for a while and I’m generally happy with it. Recently https://ddalcu.github.io/mlx-serve/ came into my radar and it has a couple of nice things: