Garnix is shutting down
57 points by ysun
57 points by ysun
Garnix was the best CI system I ever used by a large margin. It's done building everything by the time GitHub Actions is still searching for a runner, often below a minute for my moderately complex Rust project. Even faster when I change e.g. only the docs. (Which still builds them!) That's enabled by Nix of course, but Garnix integrated it really well. A CI system that integrates with the build system can work so much better than one that has to bolt on caching by downloading a tar of half the filesystem from S3 on every run. Plus, because it's based on Nix, it builds exactly the same things that you can build locally, so you don't have this long "fix typo in the yaml, push, wait 10 minutes, read the next error, add debug print, push again, ..." feedback cycle. If it builds locally then it also works on CI.
For people wondering what Garnix is:
Garnix is a CI service for nixified, flake-based github repos.
sad to hear this! I loved their blog post about baking service dependency URLs into service builds to solve rolling deployment https://garnix.io/blog/call-by-hash/
This would be completely off-topic if it were not for:
But we are open sourcing the garnix codebase, available here
Which I think is on-topic and interesting.
We are going all-in on Nix at work. I have very mixed feelings about this, but most of my negative feelings is that while it's absolutely wonderful technology, it is hugely an alien artifact that (in my opinion) is still very young.
I feel Nix is very exciting, because there's a ton of interesting and valuable work to do. We had been looking at Garnix and many other things in the Nix ecosystem because adopting Nix still means giving up on a ton of creature comforts that more conventional platforms have been growing for ages.
Precisely at work we're spending significantly more effort in "basic" stuff we'd get for free. For example, running validations on GitHub Actions is more involved than your typical project- caching, parallelization, etc. are really important to get robust and performant builds.
I feel some businesses will prosper a lot by advancing the Nix ecosystem, or that someone will build on the shoulders of the Nix giants something that will take the world by storm. Unfortunately, Garnix seems like one of the pioneers that got absorbed into a larger org.
Fun fact, Nix is not that young, it pre-dates docker by several years. It's just a late bloomer and only got popular recently.
Precisely at work we're spending significantly more effort in "basic" stuff we'd get for free. For example, running validations on GitHub Actions is more involved than your typical project- caching, parallelization, etc. are really important to get robust and performant builds.
Yes, I have the same issue. In an ideal world, if I make a small change, then on the next CI run, it should be quick. This doesn't break hermeticity, due to the the cache being keyed off the build inputs.
A typical CI job that I run is to spin up two NixOS VMs, and run some integration test between them. This works, which is impressive, but it shouldn't take as long as it does (3-5 minutes?).
When running on ephemeral CI runners, the duplicate work between successive CI runs is: downloading the same packages from the Nix global cache, and when small changes cause large rebuilds.
Downloading the same packages can be addressed by introducing state to runners, which is most easily done with something like magic-nix-cache, which effectively does pull-through caching.
An example of a small change causing a large rebuild is a small change to a Rust program that still requires compiling all of the dependent crates. I've not tried it, but https://github.com/ipetkov/crane aims to address that, by on-the-fly splitting off crates to separate nix build units ("derivations"), so that it can benefit from nix caching. I don't think this is Nix specific, but it might be felt more because my NixOS machine might have 4-5 custom binaries, each that need to be built when building my NixOS machine.
The closest ecosystem re builds to Nix is Bazel, which has similar issues, but seems to have a bunch of companies providing Bazel build+cache as a service, e.g. https://www.engflow.com/
Our resident Nix/Rust expert added crane to our build just very recently. So I guess it's good.
That's a shame, I saw some odd issues with it over the last week or so but didn't really think much of it.
Even though I was a bit annoyed that they only supported GitHub it was still a stellar service. I will take a look at the open-source version over the weekend and asses whether it's reasonable to self-host, but if anyone knows of any alternatives for Nix builds, let me know.
I’ve been using garnix since launch and I’m pretty sad they are shutting down. If anyone have suggestions on Nix CI or self hostable solutions please lmk :)
I mostly use garnix as cache, also I have a auto merge workflow based off of the exposed check status
Well, it seems like garnix is going open source, so it might still qualify for self hosting! I wasn’t a customer as I only use Nix at home, but I’ll definitely be checking out how to self host it.
I wonder how hard it is to decouple garnix from github now that it’s open source. (it should be reasonably easy to decouple it from flakes, flakes are not real and can’t hurt you)
i always felt like squidward jealously watching spongebob and patrick having fun when it came to garnix… this possibility definitely excites me
Thread hijack: I am trying to switch to Nix for CI but my dev/ci environment is large (mainly due to having multiple full browsers) and I can’t figure out how to avoid a nix build or cache restore (even restoring 10gb over 1gbit is way too slow).
Docker has this solved (if you self-host your runner): you can just make a docker image and keep it cached on the host that spins up CI runners.
But I cannot figure this out for Nix. There needs to be a way to share a nix store that has everything needed for the dev env already, and it needs to be derived from the actual dev env flake checked into the repo.
Nothing like this exists, right?
I'm using CrowCI (a fork of WoodpeckerCI with a bunch of improvements) with Nix (I wrote a nix flake for deploying CrowCI) and Atticd, and I designed a Container just for the purpose of running and caching Nix builds. My container installs nix + attic client, it diffs the nix store before and after the steps were executed, and only pushes the changes to the attic cache server if the steps exit successfully. I can build a NixOS VM image in around 3 minutes on a hetzner cloud CX33 instance.
i don’t understand, if you’re hosting a runner & you seed it’s /nix/store with whatever you need for that particular workflow then it’s just there, same as an OCI image, right?
at a previous job we managed our own GitLab runners and made sure to pre-warm them by instantiating a set of recently built artifacts before they were put into service, at which point the jobs would hit whatever was cached in the /nix/store
Yes that’s what I’m asking. How do you seed a runner’s nix store with a flake from the repo? How do you avoid transferring the nix store over the network and how do you keep the store in sync with the flake?
What forge/runner/ci supports this? How would you do it?
At work I've configured a private Gitea instance and an EC2 instance with something like the following:
services.gitea-actions-runner.instances.canton = {
enable = true;
labels = [ "native:host" ];
hostPackages = [
pkgs.bash
pkgs.gitMinimal
pkgs.nix
pkgs.curl
pkgs.jq
];
};
And then our repository uses the "native" node:
name: Canton Nodes
on: [push]
jobs:
build-nodes:
runs-on: native
steps:
- name: Check out repository code
uses: actions/checkout@v4
- name: Run `nix flake check`
run: nix flake check
Of course, this is only good for trusted environments with builds which won't make a mess of your host. Practically, all of our builds just use Nix, so we do have some amount of sandboxing, but in theory people could make a mess.
The benefit is that Nix will just use a single /nix/store for all of our builds. I've also turned this node into a Nix cache for our developers.
Interesting, but then you define the packages at the runner level and not the repository. Having the packages be defined outside the repo is a nightmare sometimes, eg when I need to keep the playwright npm package version in sync with the browsers in the nix store.
The packages coming from the repo flake is critical
No, the packages you see there are the requirements for our Gitea Actions, not for the actual projects. We have big complicated Haskell, Go, Kubernetes and NixOS projects. The runner just needs Nix and Bash to get those builds going.
Oh I see, so you're just building a runner once, having it run for a long time without gc, and hope that software doesn't mess it up. I guess that feels like a downgrade to me from non-Nix CI best practices and I don't think I want to go down that route.
It makes a lot of sense if you're doing your builds in Nix tho.
CI tasks can save their own Nix gcroots. --out-link is an automatic gcroot – just leave it in a persistent/cache directory that won't get deleted when the job is done. --profile behaves the same – my CI does a nix build --profile "$CACHE_DIRECTORY/… of my nixos/home-manager setups and keeps last week of results. You can save your build dependencies from GC the same way.
I'm mostly interested in the build results: the CI runner runs Harmonia to expose its Nix store to a central ncps, from which my machines download prebuilt packages. Once a package is in ncps, ncps will take care of caching it as long as it's needed (so I don't rely on local gcroots too much). If you have multiple builders and better bandwidth in their LAN, you can make a tiered cache this way (if something is not in local nix store, it will try nearby ncps, which will look in other local nix stores via harmonia and fall back to nixpkgs cache or other remote cache).
Actually, if all the deps you need are usually in nixpkgs, pulling through ncps without rest of the machinery might solve your problem too.
How do you seed a runner’s nix store with a flake from the repo?
it depends what you mean by the runner in this case. for the sake of example, assume you want your runners to perform all CI tasks in an isolated VM:
you could have a NixOS host deployed to some sort of VPS with a persistent /nix/store. when a job requests a runner, the host spawns a new VM and mounts the host /nix/store as a read-only overlay within the VM.
once the runner-VM completes a job, it pushes its artifacts to the host’s store and then tears itself down.
the next VM spawned by the host will have a warm cache if it tries to rebuild something.
How do you avoid transferring the nix store over the network and how do you keep the store in sync with the flake?
by ensuring that the VM was able to persist its artifacts to the host via some local mechanism, even if it’s a network protocol (e.g. SSH protocol over a virtualized interface).
depending on how tiered you want your caching to be, you can use some combination of:
/nix/store so cold-starts just restore from a disk image and get whatever was cached at the time of the snapshotWhat forge/runner/ci supports this? How would you do it?
pretty much any forge or CI platform with self-hosted runners should support this; implementation left as an exercise for the reader.
If you trust your CI jobs enough to push to local Nix store, you can as well mount entire /nix (still read-only) or include /nix/var/nix/daemon-socket – this way, host's nix-daemon will run the builds (and deduplicate them if two jobs try to build the same derivation simultaneously) and they'll be saved in the host's nix store.
when a job requests a runner, the host spawns a new VM and mounts the host /nix/store as a read-only overlay within the VM.
once the runner-VM completes a job, it pushes its artifacts to the host’s store and then tears itself down.
Sure but how does the runner discover its artifacts? How do you push artifacts if the host store is mounted read-only?
pretty much any forge or CI platform with self-hosted runners should support this; implementation left as an exercise for the reader.
I don't know if that's true, I haven't been able to find a CI platform that enables this flow. I guess I'm too much of a nix newbie cause I can't figure out how to even begin scraping this together.
You've already been very generous taking the time to offer suggestions. If you have any pointers to more resources I could read on how to actually do this I'd appreciate it. Thanks!
My gitlab-ci config uses containers (other Nix CIs run on host and rely on Nix for isolation) and the mounts are like this:
dockerVolumes = [
"/nix/store:/nix/store:ro"
"/nix/var/nix/db:/nix/var/nix/db:ro"
"/nix/var/nix/daemon-socket:/nix/var/nix/daemon-socket:ro"
"/nix/var/nix/gcroots/gitlab-ci:/nix/var/nix/gcroots/gitlab-ci"
];
The daemon-socket mount is critical: it will let nix CLI send builds to host's nix-daemon. In a host-only setup you don't have write access to /nix/store either, it all goes through the daemon anyway. It's kinda similar (but less insecure) than mounting docker.sock into the container.
The gcroots read-write mount will let your jobs explicitly add GC roots. Managing the GC roots manually without the automatic --out-link/--profile roots is kinda annoying, but the automatic roots are symlink-based anyway, and keeping symlinks straight across host's and container's filesystem would be just as annoying and also very fragile.
how does the runner discover its artifacts?
once the artifact is available in a Nix store (or binary cache) that the runner has access to then cache hit/miss is determined by the fingerprint on your Nix derivation.
you can imagine it like an OCI layer cache, where if you push two images to your machine that share 90% of the same layers they get reused without you having to explicitly do anything about it.
I haven't been able to find a CI platform that enables this flow.
i don’t think anyone provides what i’m describing out of the box in a GHA or similar runner config template.
I guess I'm too much of a nix newbie cause I can't figure out how to even begin scraping this together.
eh, i wouldn’t necessarily say that; what i’m describing is a combination of sysadmin & dev work that you’d basically have to glue together to get what you want.
if you don’t care as much about VM isolation you could just spawn the runners in a systemd service w/ whatever isolation that gives you & then anything locally built will be cached until the runner gets torn down.
then the remaining step would be around how you want to handle restoring a fresh VM.
if you don’t care as much about VM isolation you could just spawn the runners in a systemd service w/ whatever isolation that gives you & then anything locally built will be cached until the runner gets torn down.
Well tuned systemd service can be at least as isolated as a Docker container anyway.
Use long-lived runners and GC less aggressively?
I don’t love the idea of having a reusable environment. Sounds like a recipe for hard to reproduce bugs. But it’s an idea! I’ll consider, ty.
Nix does its own build isolation, so there's not really anything to worry about. You can think of Nix as kind of a Docker replacement in this scenario.
I’m using nix to create a reproducible build environment but it doesn’t run my entire build. My current repo is a turborepo powered web monorepo and I would be hard pressed to replicate turborepo’s functionality within nix itself.
What's the synergy between Garnix and Shopify?
Shopify uses Nix. If Garnix is allowed to open source their code, it was probably an acquihire.