I Put a Datacenter GPU in My Gaming PC for £200
37 points by puhsu
37 points by puhsu
Really cool approach, I'm curious about the GPU falling off PCIe, but there's so many things it could be.
The loud GPU fan reminds me of my time on the CUDA team at NVIDIA. My co-worker was adding the fan control feature to NVML and nvidia-smi. Over the cube wall I heard a fan spinning up and down then he popped up with a giant grin on his face. He said it was his favorite feature to work on since the moment he had the code working he could hear the results.
If anyone is interested in self hosted LLMs, dell OEM rtx 3090s are generally cheaper than the big name brand variants and I was able to get my hands on one for ~800$ CAD.
Now I need to read up more on how vllm works because the model sometimes starts spewing long lists of related names and adjectives, I've probably messed something up.
What kind of models are you running on a 3090? I was under the impression that most useful models need at least 48 to 64 gigs of VRAM to run properly, hence the popularity of Apple M-series chips in the space due to their integrated memory design.
Qwen3.6-27B-MTP quantized at Q5_K_M, which comes in at about 19GB VRAM
and they observe 32 tokens/s inference rate on the V100. So by fitting a model with more bits per weight, the 3090 might even produce better quality (at ~10x the price of the aftermarket datacenter-grade stuff)
Qwen3.6-27B-MTP quantized at Q5_K_M [...] they observe 32 tokens/s inference rate
Wow, that's pretty good. My experience with older Qwen models was much worse but I think I didn't use the right variant since there were so many on Hugging Face. Could I trouble you for a link to the version you're running? Thanks!
I followed unsloths' tutorial to get qwen 3.6 working pretty well, I already had a 3090 for gaming so the second OEM I got for cheap (ish) lets me run K_XL versions of Q5 and I wanted to investigate Q6 and Q8 this weekend