omlx: LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

8 points by mpweiher

symgryph

I'm going to try this out. This looks cool!

msangi

MLX is definitely the model variant you want on a Mac. I’ve been using oMLX for a while and I’m generally happy with it. Recently https://ddalcu.github.io/mlx-serve/ came into my radar and it has a couple of nice things:

it’s in zig/Swift, not python so the binary is very lightweight
more importantly, it supports drafter models for Gemma, which oMLX doesn’t support yet