Making your own programming language is easier than you think (but also harder)
28 points by LesleyLai
28 points by LesleyLai
The comments here are much harsher than I expect of this community.
Is it possible that another language like Lua would have been good enough for the author? Probably. Is it likely that the author is engaging on a giant yak shaving expedition? Probably.
Is it clear that they are very skilled and having a ton of fun? Yes. Is there interesting technical material in the post? Also yes.
I will happily, joyfully read about a fellow nerd designing another scripting language for their game engine. I would read a thousand posts like this a day if it saves me from one more AI-written slop post about how some vibecoded SaaS garbage is going to save the world (and enrich the author).
Lua (or any other JIT-compiled scripting language for that matter). That's a standard choice, but it turns out that it's really hard to sandbox it.
Absolutely baffling take. The way that Lua makes sandboxing trivial is one of its greatest strengths and delivers benefits far beyond mods and plugins. No other language I've seen even comes close.
I mean the whole paragraph reads like "I have read about this language and despite it being the standard choice for the last 20 years I'm not gonna spend a few hours of research".
The point about lua versions is the only that has some merit, but I have not actually stumbled over anyone really venting, unless they use "modern" lua for something and have to fall back to 5.1/5.2 for other tasks, I think most users do either one or the other.
Yup, especially this:
Furthermore, Lua is a high-level dynamically-typed language that doesn't know anything about C pointers. Bridging ECS entity iteration into it will either force per-entity native ↔ Lua ↔ native jumps with nonzero overhead, or constructing a Lua array from the native entities, and then deconstructing it back. Either way, this doesn't sound good.
This is absolutely a 'don't trust your intuition, benchmark' kind of thing. I used Lua via Sol3, which makes it trivial to generate the bindings from C++. I wrote intentionally naïve code that copied strings in and out of the Lua world (most of the code is string processing). I benchmarked it. It was easily fast enough.
By all means, write your own language if you have some cool ideas, or just want something fun to play with. You don't need to make up reasons. 'I want to' is an adequate reason!
It's really very odd that the "common possibilities" are...specifically, Lua and C++, to begin with, right? The only two language classes that exist, I guess?
Very much does feel like this "researched" as a rationalization for "I want to make my own language"—which is fine! But it's worth being honest about that, instead of just claiming wholly incorrect things about the existing choices.
Plenty of games written by competent programmers have had some form of Lua sandbox escape (Factorio, Binding of Isaac, Redis if you imagine cloud programming as a perverse sort of game where everyone loses), which does make me wonder if something is wrong in terms of how the API is presented. The easiest example is bytecode escape. Once you know it's there, it's easy to just disable it, but the fact that it keeps happening is revealing of a broader problem: you have to construct the rules of sandboxing from understanding the interaction of disparate parts of the Lua spec, instead of being able to safely compose a program from primitives that make it clear what extra interactions you're allowing.
A more contrived example is prototype pollution between different envs in the same Lua VM; in Redis you could pollute string's metatable which would let you run code as other database users that also used the Lua features. (I think it's funny that Lua has an astronomically smaller surface for prototype pollution than e.g. JavaScript but you can still use one of its, like, 2 global prototypes to do the same old thing.)
That being said, Luau has a really competent solution to this, and I'm not sure why the author thinks the broader problem will be solved by making a new sandbox that implicitly has all of the same problems.
does make me wonder if something is wrong in terms of how the API is presented.
Oh, yes there's absolutely a big design flaw in how it's done; you have a single load function which is used both for loading textual code (basically 100% safe) and also loading bytecode (this is where like 90% of sandbox escapes come from). If you're aware of the problem it's not at all difficult to block bytecode loading, but it's a very bad default. You have to know which things are safe to include in the sandbox and which aren't (setmetatable obviously being unsafe) but that's true of every sandboxing system.
Still, the problems are pretty manageable and much, much easier to work around than starting over from scratch.
Why is loading arbitrary code in text form considered safe while loading compiled bytecode is considered unsafe?
They are both supposed to be safe. It's just that it's much, much harder to accept arbitrary bytecode safely; historically this has been the leading cause of security bugs in Lua (if you don't count "crash the whole process" bugs as security bugs).
Lua 5.1's load() function can't be restricted to text only---that was added in Lua 5.2. A lot of people still use 5.1 because of LuaJIT (where Mike Pall, author of LuaJIT, had a serious disagreement with the direction of Lua starting with 5.2). It wouldn't surprise me if the examples you mentioned still use Lua 5.1.
The other thing that bugs me about this post is that if you want to learn language design, by far the best way to do this is to write a compiler for a hosted language that targets an existing VM or runtime rather than going all the way to bare metal.
If you're interested in VM design or lower level bits then sure, by all means you can do what's outlined in the post, but it is far from being the best way to learn language design.
Why do you think the author's goal is to learn (only) language design?
After reading the post, I feel confident that the author is working on precisely what is the most interesting and fun for them. Seems great.
The title implies that they are writing for an audience that is interested in programming language design, but I agree that the rest of the post does not give that impression.
My game is highly simulation-heavy. There are hundreds of thousands of entities simulated via a custom ECS engine. Ideally, I'd want the modding language to be able to just take a bunch of component pointers and iterate over them like you would in a C for loop.
You can have better ideals! In particular, compare and contrast with how Unity, Unreal, Blender, Godot, and other rendering engines do this. External iteration's not fast enough to talk about megapixels/second and might not be up to the task of hundreds of kiloentities/second; we should think about parallelism. The big engines all use dataflow descriptions of branchless algorithms which are GPU-friendly and usually embarrassingly parallel. Maybe the author doesn't like visual editors, which is a common-enough opinion, but that doesn't mean that for-loops are the answer. I might have cut the author more slack on this if they mentioned that ECS is fundamentally a relational paradigm and that SQL is the correct historical-baggage language against which to compare.