So I've Been Thinking About Static Site Generators
43 points by polywolf
43 points by polywolf
Just out of curiosity: why is extreme speed so important to you?
The job an SSG does (transform files into other files according to simple rules) really seems like it shouldn't take that long, so I've taken it as a personal challenge to prove it's possible.
My own opinion is that speed is an aspect of beauty and making things beautiful is its own reward.
Fun article! I made my own interestingly quirky SSG recently and had a lot of fun too (albeit with very different goals as the author--my SSG is the opposite of fast). The world could use more quirky code!
My own SSG (Nanoc, written in Ruby) does an incremental build of my own site (about 1200 files) in 1–1.5s, which is plenty fast for me.
oh nice! that does seem to be around the speed I want. https://nanoc.app/doc/internals/#outdatedness-checking looks somewhat similar to what Hakyll does. Seems like it also properly tracks changing logic w/o requiring a full rebuild too?
Anyways this is very cool I would definitely consider it if I hadn't already written my own
OCaml Not Incremental Slow in all ways :)
What?
OCaml is incremental all right, at least when used with a separate build system like Make.
But the native compiler is slow. You can always try the bytecode compiler though. Slower at runtime, but in quite a few situations the compile+run total time will still be smaller.
I suppose my initial wording was both unclear and kinda confrontational. My point was that we can judge the speed, but the claim that OCaml builds are not incremental if C/C++ builds are incremental (where it's already implied that it's with help of build systems) is simply factually wrong.
ah yep shoot forgot that Dune is incremental, will fix, thanks for the correction
It's not just dune that is incremental; OCaml is designed to allow incremental compilation and has supported it from the very beginning. There is a built-in 'opaque' compilation flag which allows skipping recompiling implementations if the interface did not change.
My SSG (in Perl) is optimized to build the newest page (build --recent), which takes 1s. Fast enough for me. The result is rsynced to a simple box and served by nginx, see here.
A full rebuild takes 7.3s, but I have 3889 html files to write (if find out/ | grep html | wc -l is correct..)
real 0m7,317s
user 0m6,945s
sys 0m0,371s
For my SSG I wrote in Zig I took a hybrid approach. Full rebuilds take around 12ms, granted I don’t have many posts. When I benchmarked it with 20K posts it was around 30 μs/post. Instead of incremental builds with proper dependency tracking, I allow building only a subset of output files, so the development server can rebuild files on the fly to serve requests, which takes a few ms independent of website size. It’s not really in a reusable state right now but I open sourced most of the code in mk12/blog.
I remember passing through different stage of building my own SSG.
The "I want a fast one in a compiled language" was in Haskell with hakyll, and a little bit later using shake (also in haskell which was closer to Makefile) and I replaced most shell script by turtle script (also in haskell so they were compiled binaries instead of interpreted scripts).
My most recent setup tries the "minimal dependency" thing. A simple Makefile, with a local shell.nix taking care of dependent programs I need like pandoc (because I write in org-mode, and pandoc is a lot faster to generate html than emacs) and other things... But I am not obsessed with generating binaries that are fast. But strangely, using a Makefile was a huge improvement in speed anyway comparatively of my previous setup.
Anyway, revisiting its own SSG is really fun. I think most people creating their own SSG spend more time playing and improving their SSG than writing blog posts and that's where is the fun to me.
Anyway, revisiting its own SSG is really fun. I think most people creating their own SSG spend more time playing and improving their SSG than writing blog posts and that's where is the fun to me.
Agreed! Generating a bunch of HTML isn't really that difficult, so spending time working on the process of doing so is the fun part. It's kind of like working on your car or rearranging furniture at home.
I like the explicit design decision of "this software is for me to use." I've been mulling over writing an SSG for a while now and this is one of the goals I've given myself. Otherwise I'll waste far too much time abstracting. My blog currently uses Astro, but I'd like to move away from the JS ecosystem and use a single statically compiled binary if possible, for reasons that all fall under personal preference and aesthetics. I do like the UX of Astro, and I would like to replicate that in anything I make. Rust seems like the most viable ecosystem for this, as there are crates available for syntax highlighting, LaTeX rendering, etc.
The optimisation quests of people who do not want a LaTeX-built CV on their website!
(CL-Emb is nice and LaTeX speeds means I can relax and use whatever I like ergonomically; I could do Common Lisp compilation properly and get fast start and pretty fast rebuilds… but, LaTeX speeds anyway, I don't care)
You're building ur CV from source every time instead of just including a PDF asset? neat
Also I can't recommend Typst enough, ported my CV over to it recently and it's a lot easier to change the layout compared to LaTeX I've found :)
Well you see, one of the sources of updates to the homepage is a new entry in the publication list. Which I generate from a single (Common Lisp literal) list into HTML and multiple LaTeX versions.
I have parts of CV template that are older than Typst, and I am not yet sure Typst will keep compatibility that long in the future (it's still new), so not interested. When I want precise page layout, I just use Asymptote (with LaTeX inside the labels, obviously).
OK, sometimes I want a layout with a lot of precisely positioned small pieces of text, that might go to SVG and rsvg-convert instead.
I don’t think cold builds are worth optimizing for this much. technically interesting? yes. waste of time for any reason other than the novelty? also yes.
my blog is also a custom static site generator written in rust, but without the embedded js interpreter part. the vast majority of the time I am not recompiling the rust code at all. I’m just rerunning the same binary and it definitely takes less than 100ms. sure, on occasion it takes a couple seconds if I actually change the rust logic, but it’s still fast enough and it’s such a minority of the time that it’s not worth fussing about.
and for deployments I just rsync the output, so I don’t have to wait for my dependencies to all compile every time I push a commit.
I suggest you try and either build software made in rust in older hardware or just trying to package a bunch of rust software/code. You'll then notice that cold builds are in fact very important. Just because you can do incremental builds very fast after paying the initial cost, it doesn't really mean that the problem/use case doesn't exist.
In fact even Rust incremental builds can get quite slow if you have bad enough hardware. I haven't inspected the Rust compiler but I suspect that if you have too low of a core count you actually get penalized by Rust's incremental compilation model. I've compiled Tauri projects where only a single file was changed in a project that was like less than 10 source files and the incremental build would take 1m15 on an old intel celeron with 2 cores, meanwhile incremental compilation in C++ for a full debugger written in QT on the same hardware would take 10 seconds at most for a single file change.
We're still talking about SSGs here.
I agree this this not so fun on a 2013 i5:
cargo build --release 358.96s user 13.56s system 269% cpu 2:18.41 total
but looking at my commit history I have had to rebuild more often because I switched machines than for adding features. (4x vs 3x in 3 years or so). Warm rebuild is 20s.
I'm aware of 2 broad classes of SSGs: those written for the authors' personal use which are quirky in interesting ways (a very common topic on lobste.rs), and ones written for a mass audience that are more mellow but flawed in some other way (Jekyll, Hugo, Hakyll, Zola, Astro, & many others). I've dealt with the latter kind for as long as I've had a blog, so it's about time I took a crack at the former.
I have only dealt with the first kind. I have briefly looked at the second kind but since I already had my own site generator of the first kind (first in Classic ASP, then in PHP, later Python and finally Common Lisp), the second kind never appealed to me.
Maintaining my website and its generator is a personal project I greatly enjoy. The best part is that I understand every line of code in it. Every line of code, including all the HTML and CSS, is handcrafted. That means that I can maintain my sense of aesthetics in every byte that makes up the website. Further, adding new features or sections is usually quite quick.
I've built the generator as layered, reusable functions, so most enhancements amount to writing a small higher level function that composes existing pieces together. A few months ago I wanted to add a 'backlinks' page listing other websites that link to my posts. It took around 40 lines of new CL code and less than 15 minutes from the initial idea to publishing it.
Over the years this little hobby project has become quite stable and no longer needs much tinkering. It mostly stays out of the way and lets me focus on writing, which I think is what really matters.
I am also somewhat surprised about the focus on speed.
--------------+----
Pages | 310
Section files | 3
RSS files | 3
Static files | 74
Templates | 13
Took 507 msecs.
sh -c 'rm -rf public ; make regen' 0.40s user 0.18s system 92% cpu 0.618 total
I don't think I spent more than a couple of minutes thinking about any optimizations here, this is on 12y old hardware (but an SSD). But yeah, it's written in Rust and I don't remember benchmarking hugo.
As much as I find the approach interesting as a thought experiment, I equally chose practicality and "just get it done" when I built mine. The thought of doing it like a build system and optimizing for fast rebuilds never even crossed my mind. Unneeded complexity - maybe the boon about not planning to ever have enough pages that it will be a problem. My oldest post is just shy of 15 years old and only managed to get to 310 distinct pages and 75 asset files... and I could nuke nearly 200 pages without losing a lot.
But it made me think.
So I suppose I could save 90% time of "not a lot of time", but introducing at least 2 stat calls instead of a full file read of, on average, 4063 byte. And that's for the dumbest possible (is it?) algorithm. And even I think if it makes sense to use time on this except one enjoys doing it, I'm not trying to dissuade anyone here. I guess it does make sense if you plan to have hundreds of users.
Guess that's a nerdsnipe and now I need to test that :)
I'm a bit curious as to how Zola falls short here, besides the dirty-work tracking being non-existent generally. It still ends up in the 30-90ms range for me depending on a clean or dirty build dir already existing for a modest static site, but I'm curious if there's benchmarks on bigger sites demonstrating the pain here.
I've been thinking similarly about pipelining efficiency and recomputing for some bulk media-tag-editing tooling I've been working on, where the overhead of an indexed db for dirty-work inode checking is pretty necessary.
It gets pretty quickly into DAG territory and needing to reason about the directionality of includes of SSG templates; it's all a solvable domain space, but it'll just need some really considered design to tie these all together in a performant manner.
Zola falls short because its templating/routing story isn't good enough for me. As an example, I can't make pages per-tag because it doesn't support generating new routes like that on-the-fly. Also maybe it's just me but I've always found jinja-style templates to be more frustrating to use than React-style components; top-down vs bottom-up approaches ig.
I also dreamed of building a fast incremental static site generator. But then I realized that I only needed fast rebuilds when I was actively authoring content. My solution involved a single long-running process that both generated the site and served it over HTTP (for local viewing):
In the end, even with a JavaScript SSG (including a JS Markdown library and JS-based build-time syntax highlighting), hot reloads for my ~150 page site took 50ms on a 10+ year old computer. It would be fun to perform a minimal incremental rebuild, but realistically it's not worth it for most sites.