GitHub Actions Is Slowly Killing Your Engineering Team
70 points by iand675
70 points by iand675
There's an alternative reality where GitLab's executives don't have their heads in their asses and focused on making GitLab.com good, instead of "We added AI but like most of our features it doesn't really work". In this reality we'd be using GitLab CI which, although not perfect, is far better than GitHub Actions.
Sadly we find ourselves in a reality dominated by mediocre semi-turing complete YAML bullshit. For example, yesterday I spent most of my evening adjusting a bunch of CI jobs across different projects to deploy to a Hetzner VM instead of Cloudflare. As expected this involved the usual "Push and wait 15 minutes" workflow, because there's no sensible way of testing (or even linting) GitHub's CI config locally.
I keep thinking of what a better alternative might look like and occasionally toy with the idea of building something like that. Sadly one of the big challenges with CI is the high upfront investment necessary (especially if you want to offer macOS and Windows runners) and the near endless stream of customer abuse you'll face (e.g. people using CI for DOS attacks or distributing pirated content), so I'm not sure I'd see myself going down this path any time soon.
The CI was always the best thing about GitLab. If they wisened up to that, this could be a nice opportunity for them.
the high upfront investment necessary
Have people bring their own machines? That's what they'll want anyway, or not?
A bring-your-own-hardware model may work for a subset of potential customers, but I suspect most (say >= 70% of them) will want to use hosted runners instead. At least in the case of GitLab there were broadly speaking two types of customers:
This means that even if you let people bring their own hardware, you still need to provide some amount of hosted macOS/Windows/other-expensive-setups runners.
wrt the suggestion of TFA:
As expected this involved the usual "Push and wait 15 minutes" workflow, because there's no sensible way of testing (or even linting) GitHub's CI config locally.
Does Buildkite fix this? It doesn’t seem to, other than easier debugging.
When GitHub annouced they were going to start billing by the minute for selfhosted runners I realised that the frog was boiling too hot for me to not do anything. I found out that most of my workflows were essentially a mise.toml + consistency check + PR to update codegen out of consistency and Renovate. Some of the projects still used Nix, which proved to be redundant and a hassle to maintain. I did my own CI with blackjack and hookers and at first it was essentially an agent which served as the brain and workers that were spawned as dispatched jobs in my Nomad cluster. It was essentially that, a workflow defined as mise tasks. mise ci locally would work the same as mise ci in ci. Then I set up in a way that I can have hot workers waiting for jobs so the workflow can be quickly allocated to one of them. That way I wouldn't have to setup any new machine as member of my Nomad cluster putting it at risk just to run a handful of builds. Anywhere Docker could run can be used as a worker now passing a token and a server URL as environment variables. Then I set up a tiny little workflow that launched 4 workers at a time in GitHub Actions as a turbo mode system. If there is nothing for them to do they exit and eventually the workflow exits too. I only use this approach when there are jobs piling up here because GitHub Actions internet is so much better.
I then found out that this can be really powerful and setup shell runs (you can open a terminal in run environment and inspect stuff) and agent runs (essentially a run created so an AI agent can interact with the environment) which now are really work in progress.
The name of the project started as mise-ci, but I changed it to ciborg to avoid confusion. I wanted to make a product out of it eventually but I am open to suggestions!
Now I am looking for a way to solve the biggest problem I have with this system, which is caching downloads. [1]
I am learning a bunch about system design, tradeoffs and state machines with that, and I am dogfooding it in my projects already.
Thanks for sharing that project. I’m always looking for interesting caching proxies to make builds less dependent on outside services being up.
the main challenge now is adoption
it would be really great if npm, mise and uv started supporting the spec
the others aren't that very download heavy
I felt part one in my bones. I wish I could leave YAML forever and the programming languages that develop inside of it (ansible, gha, etc.). I wonder whether something like Pulumi/CDK/et al. exist as a CI system (I have a feeling people will suggest Nix here). The inline scripts and lack of editor support, lack of simple declarative dependencies, have to be the worst part of the authoring experience, whether you're using github-script or run, you have to deal with your runner environment.
nothing wrong with jenkins-ing it
There is Earthly
At $work we were using Earthly before it was killed, and I'd unfortunately summarize it as fancier Docker buildx frontend equivalent. A Docker multi-stage build and short shell script collection turned out to be equally powerful, faster, and had with fewer bugs wrt. caching.
If you're looking for proper programatic container definitions I'd look at Nix, though it will bring its own problems.
Contrarian point of view here but most CIs are fine if you just use them to bootstrap into your build system. I typically use make and nix or mise to provision the toolchain, and bar some idiosyncrasies of the host, I can run any phase from building, testing, packaging to deployment from the CI the same way I can do it locally. Debugging is easy, there's no divide between Dev and CI, and you can choose the tools you want for steps. It's not because YAML files are there that you should use them for everything. That being said the article outlines many of the pain points that most people would experience with CI unless they do the above.
I don’t think this a contrarian take! This is exactly what the author suggests doing with Garnix or Buildkite.
The back button in the GitHub Actions UI is a roulette wheel. You will land somewhere. It will not be where you wanted to go.
Gold. Sums up modern back button experience very well.
The Bash script criticism seems to misunderstand the idea. The point is to extract the commands from the CI config to the script so you can run them independently of the CI. You shouldn’t have an 800 line shell script like you also shouldn’t have an 800 line CI config.
The point "just write a Bash script" people often seem to miss is that a big part of CI is not just the running bit but the orchestration bit. For example, your "run the tests" script may need extra steps on FreeBSD because it runs in a VM instead of in say a container. You may also need to inject variables, set up extra services (which in turn may require a different set of steps in CI), and so on.
Or more precisely, when somebody starts a sentence with "Just do [...]" they're probably full of shit.
Yeah I don’t agree it’s “just” but I also don’t want to put the extra steps for FreeBSD into a weird YAML thing either.
Regarding extra services, I have mostly moved them from any CI config or script directly into the tests. So if a Go test needs a database, it will start a PostgreSQL container.
Exactly. We are in this orchestration hell and I don't see any tool that would really help us with it.
Yes, this post is an ad; but that doesn’t make it any less correct.
It is not an ad, it as an endorsement. I am not remunerated in any way for this opinion.
I also thought at first that it might be an ad since you start talking about your history in $CICompany and then later on focus hard on just one specific solution, ignoring the others. But based on your website's About page that doesn't seem to be the case and instead is your preferred CI?
~quad what knowledge do you have that we shouldn't just flag your comment as Unkind for a lack of invalid ?
lol, I love the spirit of the post, it's a good one. Plus for nix, haven't used it but it sounds nice
with the all the hate I have towards GH CI, it started as one of the best and turns into one of the worst, but I still kinda like it. It brings many problems but solves the main one - preventing slop from getting into the repo. AI might change that
BuildKit uses RoR? Are they on the GH path :D
I agree, and said as much at work last week. People were surprised that someone could think so poorly of GHA.
I wonder what the author thinks of https://dagger.io. @iand675: May I summon you?
Thanks for the summon. I have looked at Dagger, though I’ll admit not deeply. I already keep Nix and Buck2 in my toolkit: content-addressed builds, hermetic execution, incremental compilation are solved problems for me.
So when a new system appears, the bar is straightforward: does it articulate a model clearly enough to justify the time investment? At least for me, Dagger hasn’t cleared that bar yet.
The pitch shifts between “CI/CD engine,” “test orchestration platform,” “portable development layer,” and “programmable automation engine.” These are different things with different tradeoffs, and Dagger claims all of them simultaneously. There’s a familiar shape here, in that it sounds like how Nix is a language, package manager thing, and a deployment mechanism. So, my read is that it is a tool that has surveyed every problem in the build-and-deploy space and concluded it can address them all at once. The ambition is admirable, but I find the marketing to be muddy enough that I’m skeptical about the learning curve.
Dagger’s model is… functions in containers returning typed artifacts, composed via API calls across language boundaries, cached by content address in a customized BuildKit engine? There are clearly interesting choices in there. But when I can’t quickly predict the caching semantics of a composition, the system hasn’t yet made itself legible to me, and I have other tools that already are.
So Dagger remains unexplored. not dismissed, just not compelling enough to prioritize. If someone is trapped in GitHub Actions YAML and doesn’t have Nix or a proper build system, it may be a worthy escape. But the problems the post is really about, like owning your compute, having a debugging experience that doesn’t consume your afternoon are maybe a bit orthogonal. Buildkite solves those. Dagger is possibly answering a different question, and the question hasn’t been stated clearly enough for me yet to answer you satisfactorily 😅
I appreciate the in-depth response! I see, and agree, with how you're separating the concerns.
Dagger is possibly answering a different question, and the question hasn’t been stated clearly enough for me yet to answer you satisfactorily 😅
Haha, for me, Dagger originally caught my eye in the same way that Pulumi caught my eye. I wasn't enjoying GHA YAML and I wasn't enjoying Terraform HCL. Why not try a full functional programming language instead?
I wonder though if I separated the concerns like you did, that I'd feel less of a need for Dagger over GHA.
Haha, well to be fair, I really dislike working with nix and buck2, despite their benefits. I find the developer experience to just be incredibly tedious and alien most of the time. So I reckon there could be a really strong case to be made for something coming along and making it easy to use! I just don’t have another org that I’m a part of that I can turn into a Guinea pig to find out.
It’s YAML with its own expression language bolted on
gotta love when people take a data structure language (xml, json, yaml, etc) and then put logic expressions in it, so that you get an API surface that is "you write the abstract syntax tree of a program in a domain-specific language with no specification".
Several jobs ago I spent a few days experimenting with no off the shelf CI. GitHub provides a rich API to attach information about jobs and their state and annotations to source code to pull requests and (I think) commits. Probably other forges do something similar, but I haven't checked. The project I was on was written in Python, so I wrote a Python program. It used the the stdlib's web server to expose a web hook that GitHub called for events I cared about, and it used the API to write information back about what it was doing. I wired up temporal.io, which the company was trying to switch its workflows to, to wire up multiple workers.
My takeaways:
Continuous integration in practice eventually starts having state, such as tracking which tests are flaky, which are long running, guessing test subsets to run based on changes to the code... Even the better off the shelf systems like BuildKite require you to shoehorn this in as a kind of shadow system. In my case I just shoved it in a SQLite database.
Once you have your normal programming language, you'll do things that you wouldn't even think of in off the shelf systems such as tracking which tests failed previously on this PR and running them first, then only running the larger set if those pass. Doing that in GitHub Actions or BuildKite? A nightmare. Doing it in Python where you have the test run info in SQLite? No problem, knock it out before lunch.
Orchestrating workers was an exotic problem in the early 1990's, but really isn't today. Even something stupid like having your workers call your coordinator via HTTP with a heartbeat and get updated state info in response is enough for almost every CI system out there. If you have a workflow system you're already using, use it here. If it's too painful to use here, maybe consider what's wrong with your workflow system.
The most painful part of the experiment was actually running tests. The Python test library we were using had no way from within Python to say, "Give me a list of all the tests in this directory" and then "Run this test." At its simplest, something like
tests = testlib.find_tests_in( "/my/test/dir", recursive=True )
for test in tests:
result = testlib.run_test(test)
...do something with the result
This is a weakness of almost all language ecosystems.
Are there any CI setups that checkpoint the VM on error so the problem can later be debugged interactively? Something to get out of the terrible push-and-pray loop.
Several jobs ago I spent a few days experimenting with no off the shelf CI. GitHub provides a rich API to attach information about jobs and their state and annotations to source code to pull requests and (I think) commits. Probably other forges do something similar, but I haven't checked. The project I was on was written in Python, so I wrote a Python program. It used the the stdlib's web server to expose a web hook that GitHub called for events I cared about, and it used the API to write information back about what it was doing. I wired up temporal.io, which the company was trying to switch its workflows to, to wire up multiple workers.
My takeaways:
Continuous integration in practice eventually starts having state, such as tracking which tests are flaky, which are long running, guessing test subsets to run based on changes to the code... Even the better off the shelf systems like BuildKite require you to shoehorn this in as a kind of shadow system. In my case I just shoved it in a SQLite database.
Once you have your normal programming language, you'll do things that you wouldn't even think of in off the shelf systems such as tracking which tests failed previously on this PR and running them first, then only running the larger set if those pass. Doing that in GitHub Actions or BuildKite? A nightmare. Doing it in Python where you have the test run info in SQLite? No problem, knock it out before lunch.
Orchestrating workers was an exotic problem in the early 1990's, but really isn't today. Even something stupid like having your workers call your coordinator via HTTP with a heartbeat and get updated state info in response is enough for almost every CI system out there. If you have a workflow system you're already using, use it here. If it's too painful to use here, maybe consider what's wrong with your workflow system.
The most painful part of the experiment was actually running tests. The Python test library we were using had no way from within Python to say, "Give me a list of all the tests in this directory" and then "Run this test." At its simplest, something like
tests = testlib.find_tests_in( "/my/test/dir", recursive=True )
for test in tests:
result = testlib.run_test(test)
...do something with the result
This is a weakness of almost all language ecosystems.
What is the best way to set up runners with Buildkite (or any alternative) such that:
I don't think GitHub Actions is so bad, but my distaste of GitHub in general is growing very quickly.
As a closed-source alternative, and only working on top of GitHub, Cirrus CI looks absolutely fantastic.
But I'm very tempted just to glue together something that can execute a "well-known" script on the repo and post git notes to commits, plus a merge queue bot that merges commits with specific notes.
edit: Also it bothers me a lot that Microsoft seems to be the only org in the world that manages to get reasonable licensing for macOS [you don't pay a minimum of 24h, like mostly everywhere else], plus all the free runner use is very hard to compete against.
I've gone down that bash rabbithole a few times now, and it's played out in much the same way as the article describes. As soon as I add any complexity at all the result is something full of subtle bugs, even when written carefully. I've found writing reusable bits of CI in whichever language the rest of the project uses works very well - it might not be as quick to write as bash, but it holds up much better as things evolve.