How and Why Local LLMs Perform On Framework 13 AMD Strix Point

13 points by msf


woile

In case someone is interested. I've been tracking my progress with NixOS + strix halo here:

https://discourse.nixos.org/t/how-to-ollama-on-amd-strix-halo/74363

I'm no expert, and if you have better benchmarks please share!

symgryph

I normally won't post on these but the statistics are so wrong on every level. I regularly get things in excess of 100 g per second and I get an excess of 40 to 50 tokens per second with an MOE model that has 80 billion parameters. Also mention that this person only has 64 gigs! It makes a huge difference having more memory. If you quantize properly local llms work at much higher rates and are almost as effective as using the remote ones, at least for coding purposes. I will preface this with saying that I do use at least when coding this way, a larger lllm, say Gemini Pro to do some of the planning. The only complaint that I would have is that many of the newer models do not work particularly well with AMD. And you will be very disappointed if you try to do anything python! But if you just want to do basic research it's it's not a bad buy. If you want to do ml for a living don't waste your time with us. AMD is still far far behind Nvidia in terms of software and I suspect it will be another for you several years before they're up to the same level in terms of software support.