Lua can be a really cool HTML templating engine
109 points by riki
109 points by riki
As someone who spends a lot of time in my day job dealing with template languages that just operate on strings, techniques like this that let you write templates with structured data rather than just strings are one of my favorite things. I'm partial to Lisps myself and enjoy working with SXML, but any such design is better than string templating in my opinion! String templating whitespace-sensitive documents (hello templated YAML) is especially painful.
Those who liked this post may also like this one, along similar lines: https://www.more-magic.net/posts/structurally-fixing-injection-bugs.html
"Whitespace-sensitive" is not really what makes it bad. Trying to manipulate any defined language with plain string templating is where injection vulnerabilities come from, because it's essentially impossible to do correctly except in carefully vetted special cases. Doesn't matter if it's SQL, YAML, JSON, XML, C, Lisp — every language has rules and the string templater will not know how to follow them. Always, always use some kind of AST library for the language you're trying to generate.
It doesn't have to be insecure though, precisely because you can parse first and fill gaps second.
The tricky part is having a parser that can produce a non-error result for a template string input (an input with gaps)
The trouble is that the thing you fill the gaps with may change the structure of the document. Consider the template:
<body>
<img src="{{ image_file }}" />
</body>
This parses as HTML just fine if you remove the image_file template string, but you've still lost the ability to reason about the structure of the document! If image_file evaluates to " /></body><, you now have an invalid document.
If your template engine is sufficiently aware of the document structure to avoid this in all cases, then I think you're doing the structural templating that we're advocating for.
The point of parsing first is that you can do semantic interpolation. You do it exactly so that you can interpolate anything without there being any chance that that thing can change the meaning of the parse.
If you've ever worked with a tool that let you do safe interpolation to build up SQL query strings, this is how that tool worked.
Oh! Okay, I misunderstood what you were saying then. Yes, I think this is ultimately why structured templating like quasiquotes (and like the library described in the OP) works as well.
The parsing stage of a quasiquoted SXML expression, e.g.
`(body
(img (@ (src ,image-file))))
...happens before image-file is ever interpolated into it. By that point, it's a data structure in memory.
(Of course if you later need to serialize that structure to HTML you still need to escape any strings within that may contain markup!)
Incidentally this is one of the reasons I like working in Lisp and representing things as s-expressions as much as possible, because it means I can use quasiquotes wherever it makes sense to do so.
Essentially what I'm working on building is very, very much like SXML but native to the JS ecosystem. JS has quasiquoted template strings, so I'm making the parser machinery to allow them to be used like this.
Agreed, but implementing truly "safe" interpolation is essentially the same thing as making an "AST library" and also putting a parser in front of it.
Oh, I wasn't trying to say that whitespace sensitivity is what makes it bad! As you say, the core issue is in trying to manipulate structured data with string templating to begin with. A whitespace sensitive language that does not do this (e.g. writing SXML in wisp) is perfectly fine.
So using string templating here is folly in any case, agreed, but I do think there's also something uniquely difficult to reason about when it comes to templating whitespace-sensitive languages. Because not only do you have to consider the text that the template strings will output, you also need to consider the whitespace they will output (potentially over multiple lines), and the whitespace before and after your template strings.
None of this should be taken to imply that I think it's fine to use string templating on data structures in general though, whether whitespace sensitive or not. It is not! Especially if you don't fully control the template and its contents. But nonetheless I find myself working with existing (mostly devops) tools that expect me to do so on a fairly regular basis. So while I try to advocate for AST-based templating, I can't avoid string templating entirely, as much as I would like to. :)
Helm should be killed with fire. There, I said it. :) It’s just wrong, like attaching two boards by banging a screw into them with a wrench.
Lua is a weird language, but having spent a fair bit of time rewriting my Neovim config with it, I have to say it's grown on me. The speed and programmer ergonomics are a sweet spot for small systems.
I am curious what makes it weird in your eyes. Lua is small language with a fairly conventional syntax and low feature redundancy. There is not much room for weird. To me Lua feels rather vanilla (in a good way), certainly more so than Javascript or modern Python.
Yes, I agree with you - if you've been weaned on Javascript, a lot of sane languages are going to seem weird ..
Lua is wonderful. Easily one of the best tools for many jobs out there. If only folks would learn it better.
The confusing thing for me is that Lua is a Smalltalk. The same core abstractions are there, but pretty much everything has a different name and they're all specific to Lua, whereas most other languages picked up the Smalltalk terminology. I don't really care about the syntax (though Lua syntax is more verbose than Smalltalk), but the bespoke terminology is different.
this bit struck me as very weird (perl influenced perhaps?)
Going back to the original function for a bit once again, I’d like to explain another thing that might seem odd:
local function escape_html(str)
return (str:gsub("([&<>\"'])", escape_subs))
end
Notice the extra parentheses around the returned value? This is because string.gsub returns two values, with one being the string after replacements, and the other being the number of replacements it made. Therefore, we have to collapse it back to just the output string, which is most conveniently done by wrapping the function call in parentheses.
Yes, multiple return values are the biggest footgun in Lua by a significant margin, largely because of how subtle and invisible they normally are. Compared to the footguns in most mainstream languages it's pretty minor, but definitely an unfortunate part of the design: https://benaiah.me/posts/everything-you-didnt-want-to-know-about-lua-multivals/ (source: I maintain a compiler that targets Lua, and multiple values is the greatest source of bugs by a large margin)
Multivals cannot be assigned to variables. They can only referred to as literals, function call expressions, or the vararg ...
I know you mean that the entirety of the vararg can't be assigned as: x = ..., as that assigned the first value in the vararg to x (and consequently, you can do x,y,z = ... to get the first three varags into variables). But you can collect the varargs into a table with x = { ... }, but then you have problems if there are nils before other values (like in f(nil,2,3) results in a table { [2] = 2 , [3] = 3 } but it's not a proper array so the length operator doesn't work on it).
Personally, I've never had any issues with multiple return values or varargs, but I think I've just absorbed the issues to the degree that I don't have to think about them, or have issues. Same with the 1-indexing---it's never been an issue with me.
Well, maybe I'm biased because I'm writing a compiler, so I have to think about not only the problems they cause in code that I write, but also the problems they cause in code that other people write. =) I think you're right that it doesn't take that long to internalize the gotchas you need to avoid.
Specifically tables, how they do double-duty as arrays and maps, and metatables. Because the language is so minimal, you have to use tables for everything so it stands out. You could make the case that the weirdness of tables is pretty similar to JavaScript's prototype inheritance, but historically the weird parts of JS have been papered over by more ordinary-looking abstractions.
Lua is small language with a fairly conventional syntax and low feature redundancy. There is not much room for weird.
--[[…]] can hardly be called conventional. It’s kind of cool, but not conventional.1-based arrays.
Niklaus Wirth nods approvingly[1]
[1] not really since Oberon has 0-based arrays
If all you've used are languages with C-influenced syntax, it does have a bunch of things one could consider "weird", like a dedicated length operator, indexing arrays from 1 by convention, ~=, and not using braces.
That's mostly cosmetic though.
indexing arrays from 1 by convention
can it really be said to be by convention when many things don't work otherwise? ipairs only works on 1 based arrays, and I don't think 0 based arrays get special treatment by Lua impls, so they're slower.
Yeah, and the # operator requires 1-based arrays. And string indexing and slicing is 1-based which in my experience highlights how much worse it is than 0-based indexing.
I believe in PUC-Rio Lua, index 0 is stored in the hash part of the table, whereas LuaJIT extends its array part down to include 0 – but that doesn’t affect the 1-based semantics of the language.
There are a few quirks but as you note most of them are cosmetic, and for the meaningful ones that aren't, well, try comparing it to the list of weird things about any other mainstream language and you'll find that Lua's list is much, much smaller. Sure there's things you need to be aware of, (multivalues probably being at the top) but there's only a few of them, and they can easily all fit in your head at the same time.
Any language that requires a special function invocation to loop over an array is weird.
It's not a special function invocation. There's nothing privileged about pairs and ipairs compared to any other function, lua simply has built-in generators (iterators). pairs and ipairs are super simple generator functions.
See the docs, but for has two forms, one where you're doing
for i = 1, #len do
...
end
(it also accepts an optional third argument if you want to skip over things)
and another:
for con, v1, v2, etc in fun(t) do
...
end
where fun returns itself, some kind of internal state, a "control variable" which is basically an index, and a "closing variable" (more about those here, but it boils down to "a table with a _close metafunction that closes a resource like a file or w/e"). the function pairs just spits out itself + the arguments to itself that'll let it iterate. A small note here is that con in that form is always guaranteed to be the control variable, and that the iterator function can return however many things it wants (hence v1, v2, etc)
See the docs for ipairs:
Returns three values (an iterator function, the value t, and 0) [...]
and pairs:
[...] returns the next() function, the table t, plus two nil values [...]
True. I've long wished that Lua remove select() and allowed one to do
function x(...)
local len = #...
local a = ...[1]
local b = ...[2]
end
but I'm not sure what issues with parsing that might cause. But such a syntax seems intuitive to me.
Good news, Lua 5.5 can do that:
function x(...params)
local len = params.n
local a = params[1]
local b = params[2]
end
It collects the parameters to a table for you, but will keep them nice and allocation-free if you only limit yourself to a subset of operators.
See the "Parameters" section of the manual: https://lua.org/manual/5.5/manual.html#3.4.11
Curious that the author did mention Lisp but not hiccup, which is basically the undisputed default way of rendering HTML in both Clojure and Clojurescript. Being used to that, I find any other way of rendering HTML is just wrong. Sometimes I even use it for mockups instead of writing plain HTML!
Hiccup looks incredibly sweet and looks like a good point of inspiration for improvements to this little Lua library! Especially the class and ID sugar. I haven't found a nice way to fit them into Lua syntax, unfortunately. (Classes can be done with some __index magic, but I don't know about IDs.)
Unfortunately I'm not super familiar with Lisps, and so didn't know about Hiccup until now, else I would've totally mentioned it… I've only really looked into them at a surface language design level, for learning and inspiration. I intend to try out a Scheme one day though. Who knows, maybe I'll become a convert then :3c
Really the truth is that I've been putting away learning a Lisp because most modern languages have since adopted a lot of their features, so they're just not as appealing nowadays… but I'll admit the various macro systems seem pretty attractive.
We use hiccup-style notation to generate HTML in Fennel (lisp on the lua runtime):
https://git.sr.ht/~technomancy/fennel-lang.org/tree/main/item/main.fnl
most modern languages have since adopted a lot of their features, so they're just not as appealing nowadays… but I'll admit the various macro systems seem pretty attractive.
Honestly the main thing for me is stuff like getting rid of statements and operator precedence, (I know you can do this outside lisp but it's very rare; afaik just smalltalk and even-more-obscure ones) and the uniformity of the syntax allowing for structural editing instead of editing character-by-character: https://fennel-lang.org/rationale
getting rid of statements and operator precedence, (I know you can do this outside lisp but it's very rare; afaik just smalltalk and even-more-obscure ones)
I don't think Forth is that obscure, and it has no statements nor operator precedence.
operator precedence
I was disappointed to learn yesterday that BQN (an APL descendant) has operator precedent, evaluating the ? ternary operator before all else:
depth 𝕊 subtree: {3≠≠subtree ? ⋈depth; depth∾∾(depth+1) 𝕊¨ ⊑ 2↓ subtree }
I strongly 2nd hiccup. Here's a different flavor: https://codeberg.org/veqq/janetdocs/src/commit/848dcbd8e54ad5e0e8555abc0b5c72dab1720282/routes/examples.janet#L112
with this layout:
I’ve never more than glanced at Hiccup, but my first impression now is to be baffled and concerned, because of this from its README:
Hiccup is intelligent enough to render different HTML elements in different ways, in order to accommodate browser quirks:
user=> (str (h/html [:script])) "<script></script>" user=> (str (h/html [:p])) "<p />"
<p /> is invalid HTML. It’s an unclosed p tag plus a non-void-html-element-start-tag-with-trailing-solidus parse error.
(I don’t believe it has ever been valid, but, as a fun fact, in the distant past it would also have produced an attribute with name / and empty value. Further into the dim recesses of time, I don’t know what it would have done. I wouldn’t rule out it crashing the browser…)
Also “browser quirks” is a terrible choice of words. “Quirks” has a specific meaning in the context of browsers, and what it’s talking about is nothing to do with browsers anyway, it’s defined behaviour of HTML (which, if they had chosen a correct example, might have been that way for thirty years).
If you add the constraints of rendering incrementally as you type and an embedded markup language for writing prose, then you'll probably start to approach something close to Typst's experimental HTML output mode, but from a completely different direction!
Here's how I would write that comment example in Typst:
#let comment(user, text) = html.article(
class: "comment",
{
html.span(user, class: "user")
[ says: #text]
},
)
// usage:
#comment[isuffix][hi]
// or
#comment("riki", "meow")
Lustre provides a DSL for HTML in Gleam which ticks a lot of the boxes from the original post. Lustre HTML code is just Gleam code, nothing fancy and no macros. An example here:
https://github.com/lustre-labs/lustre/blob/main/pages/guide/01-quickstart.md#rendering-html
My only issue with it is that it’s perhaps a bit too un-magical, I would have loved to have some kind of of template string syntax in Gleam. Escaping double quotes and concatenating strings does not spark joy. But I also haven’t managed to come up with a syntax that fits the language and I like…
Ever since doing Elm, I loved this approach to templating, so I built something like it for rust. It's just so nice to have rust-analyzer, rustfmt, and the type system when writing view code.
That looks really good! I have been using maud but it being macro-based is painful. This looks much more like the Elm/Gleam style.
Great to see more love for Lua's tables! They really are a superpower when it comes to defining DSLs. Way back in college I wrote a (much smaller) Lua template engine to generate C headers specifically for a microcontroller HAL and it worked like a charm. The code is still online, although now it's mostly a historical artefact.
I didn’t realize how nice it was writing templates in the same lang (Ruby/ERB) until I started writing rust and realized there was no analog. Jinja and handlebars and friends just isn't quite the same of being able to dump raw Ruby code into an arbitrary tag. Being able to have a single point of truth for logic that is shared in your template and other places is also really nice.
The lua outcome is really neat. I hope people keep experimenting with this space.
The article mentions this, but maud (https://maud.lambda.xyz) is really nice besides being a massive quirky macro.
Erb is not exactly 'same lang' in the same way as the Lua in OP. That would be more like https://github.com/AndyObtiva/glimmer-dsl-xml eg
require 'glimmer-dsl-xml'
Glimmer::Config.xml_attribute_underscore = '-'
include Glimmer
document = html {
body {
video(:data_loop, data_src: "http://videos.org/1.mp4")
}
}
puts document
I agree your example is much closer to the Lua outcome. However, I find this harder to reason about and more complicated than ERB. Theres also a very high overhead to stuff like this and phlex. (Style/tase preference).
I use slim templates and they are pretty but uncommon so every time I edit it I have to relearn the syntax. ERB hits a sweet spot of handling most things I want to do while being simple to read and write.
My comment of "same lang" was more about being able to interop host language and templates language than requiring templates are strictly only defined in the host language.
I always preferred ERB because it looks like the final output, ie it’s HTML. So when you need to debug it there’s no need to translate from one representation to another (the problem that source maps solve). It also means you can give it to someone familiar with HTML and they will understand it. But I have recently gone completely the other way and started using Phlex. I’ve come to prefer this approach over templates. I wonder if I could RIIR.
This is a great example of the tragically of HTML. The language is to complicated to write a correct emitter (as acknowledged by, but not resolved in the article). For example <script> tags aren't handled properly as their contents shouldn't be escaped and an error should be raised if they contain the string </script which is the only way to end a script tag. (I say raise an error because I don't think there is a general way to encode </script inside a <script> tag. You would need to parse the JavaScript and make an appropriate change.) This is not even talking about the complexity of nested namespaces which can result in <script> tags that may play by different rules.
XHTML really is a better solution. It would be less code to write a fully correct XML emitter. Unfortunately it died. I partially blame browsers because no one ever shipped a streaming parser, so you were throwing away performance. Much easier to just never put user data into HTML and fetch everything from a JSON API /s. (Ok, you can put user data into HTML if it is a simple string. The hard part is inside the funny tags. But due to the complexity of HTML it is very difficult to handle these funny tags at a framework/library level.)
The language is to complicated to write a correct emitter (as acknowledged by, but not resolved in the article).
I know your comment isn't really a criticism of my post, but to give some perspective, I probably would've resolved it, if not for the fact that it felt like a large detour that would detract a lot from the article's point—instead assuming that the reader can figure out the details themself, using their own curiosity and knowledge they can find both within the article and the sources I linked. (Especially Go's html/template.)
I've actually tried rewriting the templates behind my website into Lua to test the waters, and generating <script> and <style> tags felt like the most brittle and wrong-feeling part.
Perhaps I'll write a follow-up post in the future that addresses this, but right now I'm a bit short on free time… I'm honestly glad I managed to find time to finish this one.
Yeah, my comment is more of an off-topic rant. The fact that you seem 95% of the way there but are actually like 20% of the required complexity just shows the problems of HTML. If you didn't mention it I might have blamed you a little but you did call it out so I put 100% of the blame on HTML.
This kind of reminds me of a small library I've been lugging around since shortly after Python got context managers that leans on xml.sax.saxutils.XMLGenerator to build XML documents:
doc = XMLBuilder()
with doc.within("feed", xmlns="http://www.w3.org/2005/Atom"):
doc.title(title)
if subtitle:
doc.subtitle(subtitle)
# ...
for entry in entries:
with doc.within("entry"):
doc.title(entry["title"])
# ...
It's a super useful pattern.
Some languages allow nice DSL syntax. Lisps and Schemes are the obvious examples that come to mind–their universal syntax has been used for many DSLs from HTML to package management (Guix).
Elm-html is the other obvious example, shooting to popularity in the heady days of quirky little AltJS compilers that pushed different philosophies: https://package.elm-lang.org/packages/elm/html/latest/
The really cool thing about this technique is it lets you use your programming language's powerful abstraction features. Want to make a reusable 'user profile' component? It's just a function. Want to show or hide some based on a condition? It's just an if expression (statement). Want to generate a table with multiple rows? Just a map operation or for loop.
this is neat! i have vague aspirations to do something less DSLy (and not quite as fun looking, probably) at $job with Python dataclasses to represent templates for generating some gnarly Make code that has become a pain to manage manually/“directly” in plain Make
Others here have mentioned libraries in other languages. Here is an old Stackoverflow question/answer link which goes over some of these - https://stackoverflow.com/questions/671572/cl-who-like-html-templating-for-other-languages/672665
This is very neat. I recently rebuilt my microblog site using Haunt, which uses SXML and quasiquoting for a similar style of templating in Guile Scheme.