omlx: LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
8 points by mpweiher
8 points by mpweiher
I'm going to try this out. This looks cool!
MLX is definitely the model variant you want on a Mac. I’ve been using oMLX for a while and I’m generally happy with it. Recently https://ddalcu.github.io/mlx-serve/ came into my radar and it has a couple of nice things: