Building the deployment tool I wish I had
91 points by ruuda
91 points by ruuda
Only tangentially related, but, thank you for not putting LLM-generated text anywhere near this. It’s very refreshing.
Also, the tool looks very nice and well thought out, though I think I’ll stick with NixOS. :D
The code is primarily written by LLMs, but I carefully review the entire diff and iterate until I am happy with the code before committing to the repository.
I adopted a similar LLM development methodology. Claude makes a branch, writes and edits the code, then repeatedly reviews it with clean context until no more issues are found. Once the AI is satisfied with its own work, I start my human review. I go through every single commit and read everything it implemented. I ask for explanations, rationales, question design choices, suggest alternative approaches, confirm invariants and assumptions, etc. The review loops continuously until I am satisfied. This usually results in many edits and amendments, minor and major refactorings, documentation and lots of TODO items for anything out if scope of the branch.
It's been great. The best part is I can calibrate the amount of effort I spend. I cook branches for days in my favorite projects where I want quality all the way down to the git history. The projects I care less about still get love, just not as much.
I adopt a very similar workflow as well.
repeatedly reviews it with clean context until no more issues are found. Once the AI is satisfied with its own work [...]
Do you give it tools to verify its work? I guess it depends on what type of software you're building. For a web app, I tried using e2e tests and playwright-cli as verification methods, but I didn't get very satisfactory results.
I use Claude Code inside a development virtual machine. I let Claude do whatever it wants in there, bypass permissions is always on. Usually I just ask it to meticulously review code, analyze edge cases, look for bugs, write adversarial tests, look for code duplication, consistency issues, quality issues, whatever else comes to mind. The code review superpower often gets invoked too. I just keep doing this as many times as it takes until it literally can't find anything to complain about anymore. Then it's my turn to review.
I've been wondering about tools as well. I wonder if there's anything I can do to improve the AI's performance here. Couple days ago I saw someone load an entire project into a "vector database" that supposedly lets the AI bypass the constant reading and parsing of files and run queries on the code structure instead. Sounds pretty cool but I haven't thoroughly explored the idea yet.
My main project is a programming language project written in C. The automated test suite is the most important tool. The AI knows to run it after every change, every commit. I also asked it to make a crude benchmark harness in order to avoid introducing performance regressions. Stopped me from committing some helpers that LTO wasn't inlining properly just days ago.
I'm definitely going to try this out! It looks like a more polished version of a system I built for myself to deploy systemd-based services.
EDIT 1: just went through the tutorial and it looks great. What would be a good way to deal with local state? For instance, where should be an sqlite database for an app be stored? I couldn't find that in the docs.
EDIT 2: is there a way to transfer the app's binary to the server, for it to be used in the systemd unit? If not, how are you dealing with binary distribution?
Reply to your edit 2: Not with Deptool alone. For the applications I run, I build them into small EROFS images with this Nix-based script, and that one also includes a script to push the images to my servers. It used to be standalone but I now combine them: building and pushing the images is one step. They go into unique directories, so multiple versions can coexist. The output of the build also contains a json file with the file paths. I import that into my cluster configuration, and render that into a systemd unit, which I then deploy with Deptool. So one tool handles distributing images, Deptool handles activating them.
If you’re running containers, then usually you push them to a registry, and on the server there is only a configuration file that specifies what to pull, so that part you can manage entirely with Deptool.
Reply to your edit 1: On the server side? Wherever you would normally store it, the standard place is in /var/lib/<yourapp>. If you run your application as a systemd unit, you can use StateDirectory= to have systemd create it for you, owned by the right user.
I am always a bit skeptic when people release a new deployment tool, but yours looks well designed and polished. Congrats!
I see you use the ssh command directly. I think this is the right way because you know this "ssh" works. The user could have a very specific configuration or event a patched ssh binary. Any tool trying to implement ssh through an external library will go in the way of some users.
Another approach is to use bootable containers, which is quite nice. The only thing that I'm still missing there is something to actually run the bootc update --apply on the appropriate hosts. There's a mechanism for auto updates but this isn't coordinated and that's not something you want in a cluster. Right now I just do this by hand, though it should be easy enough to script out in the future (given that the bootc command is really all you need to run).
What a coincidence! The post showed up while I'm discussing simple deployment strategies with a friend of mine and to be fair, this is really close to what the conclusion we were reaching!
However, may I ask you how do you manage secrets on this setup?
Could you tell me more about how and why you use EROFS?
I use it to distribute Nginx and a few other applications to Flatcar, in much the same way that people use OCI images for this. Flatcar has no package manager, so you have to get the software and its dependencies on there yourself somehow, and a self-contained filesystem image is one way of doing that. For an OCI image, you need a separate tool like Podman or Docker that unpacks the tars somewhere and sets up the stack of overlay mounts; if it’s a filesystem image already, you can just run it as a systemd unit with RootImage=. I build the image with Nix, so it is really minimal, it has only the Nginx binary, LibreSSL, libc, and a few other shared libraries, but no Bash for example. This is part of defence in depth; if Nginx has a remote code execution vulnerability, it’s running inside a filesystem namespace where an attacker has very little available to build a next stage exploit, and the entire filesystem is readonly, not just because it was mounted readonly, but because there is no write for EROFS. I used Squashfs in the past. It worked fine, but that filesystem was designed for the age of live cds, EROFS makes different trade-offs that are nicer for today’s systems, but to be honest I don’t think it makes any measurable difference for my use case. The images are smaller, but that’s because I use different compression settings. In theory EROFS is better suited for content-defined chunking for if you want to reuse data between different image versions, but am not actually doing that to transfer the images yet.
Prompting the deployment tool I wish I had
but I guess in some sense good for them that they were able to convince the floats to write rust
If I'm going to use an LLM to generate code, I actually prefer Rust for many things. To put it politely, Rust is a "disciplined" language. And it has strong conventions and tooling, both of which help LLMs.
And, weirdly, LLMs tend to generate smaller programs in Rust than they do in some languages, at least when they're nudged. Since I insist on reading and fiddling with all the code anyway, shorter is better for me.
How do you handle secrets with this? What's your preferred workflow? Do you embed them in the EROFS image? or do you inject them with systemd?
At the moment the only secrets are my TLS certificates, but they only need to be on one server and Lego puts them there directly. The directory gets mounted read-write in the Lego unit and readonly in the Nginx unit.
If it only keys on OS and CPU architecture, how does it know whether extensions like AVX2 are available for use or not? That would not be part of the CPU architecture string.
No assumptions about what’s available except the kernel
Surely there must be some assumptions since the Linux ABI expands over time with new syscalls. If the remote host has a very old kernel, surely the binary would try to access syscalls that do not exist?
The agent is very simple, its only job is to read a bit of data from stdin, write files, create symlinks, and execute a few binaries like systemctl daemon-reload. It wouldn’t benefit much from being optimized for a specific CPU microarchitecture, it spends virtually all its time waiting for the network, the disk, or external programs. For the same reason it doesn’t need any fancy syscalls, so I do expect it to work even on old kernels. But if for some reason it doesn’t … well I’m not trying to deploy against old kernels :-)