Teaching an LLM a Niche Diagraming Language
37 points by huytd
37 points by huytd
I did an experiment to train Qwen2.5-Coder-7B on a niche diagraming language, and the model was able to generate syntactically correct code 86% of the time.
This post outlined the process and decisions I took during the training process. I think it might be helpful to share it here to get feedback, since I know there is a lot I did wrong along the road and could learn from the feedback of many people here.
I’m not clear why people have flagged this off topic and spam.
The techniques described are interesting and topical. Is the author affiliated with the tools used? Is this a paid service? Is the issue?
At the end of the article, in a collapsible tag, I did mention a text-to-diagram product I built, but that's only it. The article wasn't about my product at all, and none of the tool used/mentioned has anything to do with me. Maybe it's still confusing someone :D
I suspect it's something that's just inherently a problem with the idea of selecting a reason to downvote: some people just don't want to see anything about a given topic, and will downvote it, choosing a sort of generic reason.
There's violent anti LLM sentiment on lobster.rs, people are afraid and confused and in some kind of weird denial about their new AI overlords.
This is such a cool project and I feel it does not deserve the vibecoding tag. Can we really not do better with a tag that maybe has a less negative connotation?
On Lobster.rs the "vibecoding" tag is used for anything relating to LLMs, even posts that are not about using them to write unreviewed code.
The negative connotations of that are deliberate. I think it sucks, because it's an incorrect usage of the term and is also confusing to people who haven't sat through the arguments about it in the past.
Moreover, users here consistently change the ai tag to vibecoding even when the story is explicitly about the development/ML aspects of LLMs. The ai tag appears to be effectively dead (this can be verified in the mod logs).
Normally it happens when this is about development-around-fixed-LLM though, here it is actually about post-training an LLM so this case wouldn't fit even that expansive interpretation.
I think there is a grey zone around articles where people try to study or train LLMs from "the outside" which some (like me) would say is not scientific and unlikely to lead to progress in the area.
Anyway, I think it's not a big problem as if you're interested in this you can choose not to filter out the tag.
I don't fully filter, and I look at both, but I look at a subset of the total /newest flow. And having LLM case of ai as touching the weights, non-LLM ai clear from the title, and vibecoding meaning weights are used as-is was some way to make sense of all that.
unlikely to lead to progress in the area
(looking at the image generation)
Making the methods and trade-offs of minor fine-tuning better known is what progress in applicability of local models looks like, though!
In image generation, how the model handles composition, and what the model knows about styles and objects are getting somewhat separate via LoRas and sometimes cheaper-to-train-than-base-model from-outside methods of model control or combination.
I think there is a bit of chicken-and-egg here, but local LLM models are already powerful enough to be useful for some applications even if updates stop. Unfortunately, for some of these there is a risk of knowledge cut-off eventually getting you. If there is enough activity specifically on adding a bit more data relevant to you into pretrained models, there is some hope for, say, Qwen5 including a comfortably locally runnable version (at least if locally means Ryzen Mobile with maxed-out RAM) that you can plausibly fine-tune in setup with low-double-digits high-end-consumer GPUs (so, one university computer room over a month's nights).
This might not need any additional progress in the architecture compared to what is going to be done anyway, but it has a chance of meaning a lot of ecosystem progress.
I also don't like this tagging, and I'm on the anti-LLM crowd!
This is not the only example of a topic that is controversial but tolerated on lobste.rs which is "punished" with a weird, overly (and incorrectly) specific tag. Another one is blockchain stuff, which we have decided to tag "merkle-trees" for some reason. What?
Another one is blockchain stuff, which we have decided to tag "merkle-trees" for some reason. What?
The tag "merkle-trees" is intended to strictly limit the topicality of cryptocurrency discussions to the purely technical aspects, not the ancillary financial stuff.
When I suggest the "vibecoding" tag over an existing "ai" tag, I try to imagine someone filtering the former but not the latter. Would they be interested in seeing the content?
Thank you! I initially set it as ai, but after a few turn, it ended up as vibecoding and I could not edit it anymore :(
It looks like users suggested the tag change from the moderation log, and IMO they got it wrong here.
I enjoyed your writing: good storytelling, easy to follow for a technical topic, and a bit of humor mixed in. Novel topic too (for me, anyway!)
I don’t want to deal with thinking mode, so Qwen3 is also out of the list.
Perhaps worth noting that Qwen3-VL, despite being technically multimodel, has an 8b model that performs reasonably well with plain text (and is separate thinking vs instruct as all the releases post-2507).
What does handling thinking mode entail?
most Qwen3 models will emit reasoning via the <think> tag by default, unless there's a /no_think token in the prompt. I will have to either include the reasoning tokens when I train my feature, or figure out a way to skip it. whichever way, with my limited experience, my initial thinking was: "no, not thinking model, not this time" :D
Thanks I appreciate it. I’m realizing that I’m highly ignorant of even the top layer of how llm-based systems work. I feel like I need a remedial undergrad-level introduction like I had for compilers.
Why not just include /no_think in your training prompts?
My concern is doing that would introduce some biases for the model, it would be safer to have a dataset that supports both thinking and non thinking mode, to keep it balanced.