SelfCI - a minimalistic local-first Unix-philosophy-abiding CI

84 points by dpc_pw

jdpage

One side benefit of it being local first that doesn't seem to be called out: being able to debug your pipelines locally! One of the worst parts of managing CI is easily the development loop. I'm excited to give this a go.

technomancy

This is less an upside of local first and more a downside of clownshoes services like github's which just don't bother to implement basic interactive shell functionality.
- jdpage
  
  Well, there are other parts of CI pipelines that can need debugging besides the actual steps themselves, but yes, you're right that Actions (and ADO Pipelines) are missing critical features.
- ClashTheBunny
  
  Don't most people do this by just having a step run on failure that starts a cloudflared instance and sleeps?
  
  https://github.com/valeriangalliat/action-sshd-cloudflared looks pretty easy to just add to anything.
  
  I used to use ngrok, but they no longer have a robust free tier...
  - technomancy
    
    That is a common workaround for using a system that is missing basic built-in functionality, yes.
- thiht
  
  One huge drawback of local first is that it’s not neutral. The whole point of a neutral CI that’s not tied to the developer’s local environment is that it’s independent from the said local environment, to avoid the "it works on my laptop" syndrome. Letting all the CI steps be available to run locally is great, but it should never replace a hosted CI.
  - dpc_pw
    
    90% of my small projects (Rust libraries and utilities very much like SelfCI) my dev machine is The Reference Machine. :D . Also if the project doesn't work on some developers' machines the "works in the CI" is not a solution to anything either.
    
    thiht
    
    Also if the project doesn't work on some developers' machines the "works in the CI" is not a solution to anything either.
    
    It's not a solution but a proof there's probably something to fix on the developer's side. If the hosted CI is sane, its environment is more stable than a developer's environment, hence why it's considered neutral.
    
    dpc_pw
    
    90% of the time it's true, but it did happen to projects I worked on that the CI logic would depend on some little detail that happened to be true in the CI, but not universally and things would fail not on one developer's machine but for many developers. Or things that failed only for on developer were caused by little mistakes in the CI/test logic itself. In principle if software fails on A, and passes on B, it doesn't prove that it's A's fault. My preference/philosophy is that more diverse and noisy the testing environment, the better.
    
    greenheart
    
    While it's true the isolated runtime is not provided by default, it seems straightforward to use a shared Nix or Docker config to ensure a consistent environment for all contributors. In practice, this could likely work similarly to how .github/workflows or similar have been configured in the past.
    
    sjamaan
    
    I suppose you'd still be running "hosted" CI for your main branches and for all branches that are pending a merge.
    
    With this, it is (presumably, haven't tried it yet) nice that you don't need this huge pain-in-the-ass setup like with GitLab CI or GitHub actions or whatever where you have zero chance of running the same environment locally and you end up trying to observe what's in principle an ephemeral container as it is running so you can inspect what the hell is going on.
    
    wink
    
    Baking in virtual machines or containers, etc. does not make sense for a local-first CI. It makes things more complicated, harder to set up, less composable, heavier and slower.
    
    Chiming in here with this quote.
    
    Even back when I was not sold on containers (2014?) I loved Drone CI because unlike other CI, it let me set up a "defined good" setup, or neutral as you call it, usually with a 5 line Dockerfile.
    
    The only time it breaks down is with kernel or arch differences of course, but at least that is solved
    
    and yeah, for this I would probably do podman --rm -it -v $PWD:/app now, so much for "hard to set up".
  - krtab
    
    If you trust that the contributors are not actively trying to attack your codebase, I feel like you could even have the CI run on their machine and generate an attestation that it ran on a given commit, and require this attestation to be present in the MR before merging, which allows to still use the MR workflow of the forge while keeping the CI local.
    
    dpc_pw
    
    True. In most groups I collaborate in the CI is not really a security boundary, it's just a human assistance mechanism to make sure we did not forget to check everything still works, because we're all just busy humans. Though running the CI multiple times on different machines and on some designated configuration is also good because weird issues do show up from time to time - ones that happen only here, but not there etc. Even on builds with reasonable isolation (like Nix) I've seen such things. So local-first collaboration pairs really well with redundancy. So ideally contributors would run CI locally to save their own time, then maintainer would run it again after viewing it and deciding it's probably good enough to merge.
    
    Though for big projects with a heavy CI this might not be so straightforward etc.
    
    I am still thinking about how to best extend SelfCI with ability to delegate specific jobs to multiple machines and spreading the workload between multiple developers. Since I work on big Rust projects, would be nice if e.g. a rented Hetzner machine could transparently run some of the CI jobs and/or multiple collaborators could pool their machines and run CI jobs for each other. The structure of a starting jobs should support it transparently.
    
    byroot
    
    Even if you trust the contributors, a CI environment is valuable to avoid the "works on my machine" sort of problems.
    
    Now of course with containers and such you can get that, or very close to that locally, but it's important to remember that CI isn't just there to ensure tests are ran, it's there to run tests in an environment that is better controlled and more reproducible than developer machines.
    
    aae
    
    a couple paragraphs into the description I was expecting to some attestation mechanism because I had the same thought
  - op
    
    my understanding of this project is that it offers a job definition spec and an executor runtime, but the actual process isolation etc. are performed locally. this is pretty sweet, but in what ways is this different from a makefile + nix? or say, just nix flake check? perhaps it is easier to define a job DAG using selfci?
    
    as for providing a "social proof" that the job was executed locally: an approach I have seen on nixpkgs is for users to execute the nixpkgs-review tool which runs the CI checks locally and prints out the SHA of the derivation(s). other users can run the same tool similarly and compare SHAs to verify/debug issues. in a project with known collaborators, I would automatically trust the SHA once posted.
    
    re: using "base" as trusted and only executing against the job spec on "base": would this make it impossible to "PR" changes to the job spec itself? something I do often on GitHub/Tangled is to debug CI on a PR/branch respectively until it passes (sometimes, if the job executes only on PR, then it is best to test by triggering a PR).
    
    dpc_pw
    
    in what ways is this different from a makefile + nix?
    
    the CI runs on a copies of the repository, so while the CI is running you're free to keep hacking, switch branches etc.;
    
    there's a built-in queue, so you can enqueue runs for multiple things and not have to babysit them;
    
    the base vs candidate workdirs allow enforcing some important checks are unmodified;
    
    selfci has a built-in parallelism to spawn parallel jobs;
    
    the output is going to be more CI-like.
    
    but ultimately, you can delegate selfci to call into makefile/just/nix/docker or whatever else tooling of your choosing and largely get it out of the way. That's the whole goal.
    
    an approach I have seen on nixpkgs is for users to execute the nixpkgs-review tool
    
    Thanks. I'm a mild-intensity Nixpkgs contributor at this point, but I wasn't aware of it. But I might be misunderstanding what the sha proves. If I need to run the same thing myself to get the sha to compare, didn't I just do the whole CI run already, and I don't really care about the original one? I guess it proves (somewhat) the other person ran it too, but that doesn't seem all that useful? Or is it kind of potential of checking someone's bluff.
    
    re: using "base" as trusted and only executing against the job spec on "base": would this make it impossible to "PR" changes to the job spec itself?
    
    Since the core maintainer (or some automation under their control) is free not to run the CI, they can change whatever they want and just push to trunk. Very similar to how in Github sometimes it is neccessary to be disable required jobs in branch protection or use admin force-merge privileges. It's also very easy to run any candidate against any base to verify any changes to the rules (typically running candidate against itself).
    
    dpercy
    
    I also thought build isolation is the main point of CI, but the points you raise remind me of a bunch of problems with git hooks, that SelfCI solves:
    
    Git hooks aren't versioned by default.
    
    If you do install the hook as a symlink to a versioned script, then it runs the not-yet-verified version of the hook rather than the one on "base".
    
    Git hooks verify the worktree rather than what's staged for commit. (Unless you use another framework on top like "pre-commit".)
    
    Git hooks run synchronously so they have to be very quick.
    
    Maybe SelfCI is more of a "git hooks done right" than a Github Actions? Like if CI == build + hooks, you use Nix or Bazel for the build and SelfCI for the hooks.
    
    dpc_pw
    
    Git hooks verify the worktree rather than what's staged for commit.
    
    Yes. Though there are solutions for that.
    
    Maybe SelfCI is more of a "git hooks done right" than a Github Actions? Like if CI == build + hooks, you use Nix or Bazel for the build and SelfCI for the hooks.
    
    Yes. In a way.
    
    To me the git hooks are just a subset of CI that can be done fast enough to run it very frequently. Other than that there is no difference between them so SelfCI could handle both. One could easily modify SelfCI's CI check script to e.g. skip some jobs/steps based on env variables and have a "slow full CI run" and "fast commit run".
    
    If you look at the SelfCI's own CI, this job is just basically the "fast inline lints", and the other parts are the "slow Nix build".
    
    I think in modern day and age, where it's relatively easy to have reproducible dev environments and isolated builds, and dev machines are often as powerful as servers, developers should just aim at running the whole CI locally, not just the trivial subset. But...
    
    As I've been using Jujutsu for a while now which doesn't have git hooks implemented yet, SelfCI is in fact somewhat motivated at replacing them and I do have some thoughts on how the pre-commit hooks should be replaced with something like SelfCI, integrated with the DVCS itself:
    
    You know what would be awesome? If JJ could somehow run pre-commit hooks in the background automatically for me. So when I do jj status and jj log it could show me that lints etc. passed/failed/are-being-executed/are-queued for every commit. By the time I get to the pushing part, the lints could be already verified for every commit. 🤩
    
    I have not pursued it (yet?) because I think for it to work well, it would have to be well integrated in the DVCS itself. Basically DVCS should just schedule CI/lint run in the background for every commit/change automatically, and track these, so developer does not even need to call selfci command to do anything manually. Then, under the hood it could work very much like SelfCI or even just call into it.
    
    op
    
    I guess it proves (somewhat) the other person ran it too, but that doesn't seem all that useful?
    
    yeah its not possible to prove that the computation was run without intervention based on just a SHA! among a group of trusted contributors, i would trust that any contributor that submitted the SHA has executed the program locally. its slightly more comforting than "it works".
    
    erock
    
    re: using "base" as trusted and only executing against the job spec on "base": would this make it impossible to "PR" changes to the job spec itself? something I do often on GitHub/Tangled is to debug CI on a PR/branch respectively until it passes (sometimes, if the job executes only on PR, then it is best to test by triggering a PR).
    
    I think the idea here is that you don't need a pr branch to make ci changes, you can iterate locally on main.
    
    dpc_pw
    
    The simplest usage envisioned is core maintainer fetches changes (like contributions), reviews them, and when they look OK, runs selfci mq add <change_id>. This will run the CI and merge them to trunk if CI passed. If the change makes changes to the CI itself, the core maintainer can run selfci check --base <change_id> --candidate <change_id> to validate that the new CI passes against itself. Then they can merge into trunk manually effectively "skipping" the "base-CI" check.
    
    It's important to note that that only the command itself (the yaml file) is read from "base". And the command from base can just call into script in the candidate dir to use whatever version of the CI is defined in the candidate, skipping this whole mechanism. Projects can (when they think it's important enough) call into script(s) inside the base instead, or some combination of it - e.g. important checks from the base, then less important from the candidate.
    
    erock
    
    Kudos, I like the way you are thinking about ci
    
    mvg
    
    I really like the implementation agnostic side of this: Signaling the CI system which jobs to start and which step is running creates a large amount of flexibility.
    
    Only executing the CI logic from the base (main) branch is also a nice touch for increased resiliency against malicious change requests.
    
    dpc_pw
    
    Signaling the CI system which jobs to start and which step is running creates a large amount of flexibility.
    
    Indeed a big part of motivations behind SelfCI is how clunky and brittle it is to implement any medium-level complexity structure in YAML-DSLs. Relatively simple logic like "do X, then do Y1 and Y2 in parallel, then Z, ..." is a trivial thing when expressed in any programming language, so why do we need to suffer?
    
    Only executing the CI logic from the base (main) branch is also a nice touch for increased resiliency against malicious change requests.
    
    I am happy to hear you like it. I find it quite flexible and powerful, but I wasn't even sure if I will be able to well describe the appeal.
    
    There is this fundamental problem of how can a piece of code enforce anything w.r.t making changes to it, if the enforcement logic is right in the same codebase so can be changed along with the changes. In Github this done via projects Settings, branch protection, etc. which is also very clunky and hard to reason about sometimes, especially when release branches are introduced, or required checks are being refactored.
    
    It seems to me that "verification comes from the branch that is about to get modified" ("base") solves it well, and leaves to the devs implementing the CI the decision at which point to call the verification logic from the change being made ("candidate").
    
    bhearsum
    
    Indeed a big part of motivations behind SelfCI is how clunky and brittle it is to implement any medium-level complexity structure in YAML-DSLs. Relatively simple logic like "do X, then do Y1 and Y2 in parallel, then Z, ..." is a trivial thing when expressed in any programming language, so why do we need to suffer?
    
    In my experience (as a Release Engineer), defining CI configuration exclusively in turing complete languages inevitably leads to unmaintainable spaghetti. While it's impossible to avoid writing code completely, I find that it's very beneficial to split things between code and pure configuration. In general, the scheduling piece of CI (both in terms of steps within a job, and how jobs fit together in a graph) fits excellently into pure configuration. Job or step implementation, on the other hand, often requires something programmatic.
    
    dpc_pw
    
    Respectfully, I vehemently disagree.
    
    In my view people over and over fool themselves that what they are doing could be done "simpler" by being "declarative", which they interpret as using some crappy DSL because they are afraid of programming, and then when inevitably a real world complexity hits, the surface level simplicity breaks down and DSL keeps being extended with ever increasing absurdities trying to mimic what a general purpose languages were built to deal with.
    
    In general, the scheduling piece of CI (both in terms of steps within a job, and how jobs fit together in a graph) fits excellently into pure configuration.
    
    If CI is simple, a program or a shell script running a handful of steps is very much like a yaml listing these steps. The trivial case is not very interesting. And "declarative" can be very simply done in any general purpose programming if so desired.
    
    But eventually the CI of a growing project will become heavy and slow so someone wants to introduce a check if certain part of the code changed, to decide if certain heavy tests need to be run, or CI needs to accommodate legacy/expensive singleton system so one needs to add global locking around it. Someone wants a custom global dashboard. Certain tests need to be tiered. The codebase needs to go through drastic restructuing, so tests need tracking the flakiness statistics. Productivity teams needs to run some research, so they need to plug some stuff into the CI to track stuff. The real world throws curves at the testing requirements, and they can't be dealt with because everything is boxed in some generic crappy underfeatured YAML DSL. In Github Actions the workflows quite quickly become a total mess of weird random stuff copied from various places, usually heavily using general purpose programming languages under the hood.
    
    defining CI configuration exclusively in turing complete languages inevitably leads to unmaintainable spaghetti.
    
    It's true that the freedom and possibilities that the fully general programming gives makes it much easier to shoot oneself (and the whole team) in a foot. But it is my strong belief that if a developer/team doesn't have skills and culture to handle tools solving their own problems, they will be even worse at attempting to build tools solving other's people problems.
    
    srtcd424
    
    Agreed. It may be because I'm a bit of dinosaur, or for some other reason, but I usually find unnecessary verbosity reduces rather than improves readability / comprehensibility. If I had a dollar for every time I found myself thinking "these 2000 lines of YAML could be a 150 line shell script" ..
    
    rtpg
    
    Is there any notion of forward declaration of nodes in the CI graph here?
    
    I think it's very good to have dynamic systems here, but I also think a big part of CI runner value comes from being able to set up a decently complex dep tree. Without that, then it's harder to know (just by looking at the in-progress CI run) if there's something coming up next.
    
    I do very much not want to be writing YAML files, but sometimes I do want to have at least a mostly static dep graph when possible. But feels a bit tricky
    
    andyc
    
    Side note, regarding
    
    lint) # ... export -f job_lint nix develop -c bash -c "job_lint"
    
    in https://app.radicle.xyz/nodes/radicle.dpc.pw/rad%3Az2tDzYbAXxTQEKTGFVwiJPajkbeDU/tree/.config/selfci/ci.sh
    
    I would not use export -f -- this is the bash-specific feature that led to the 2014 ShellShock bug
    
    Technically it is mitigated now, but I would not call it secure. I would not call it insecure either -- it depends on the exact usage. CI environments are part of distributed systems, have many inputs, and are frequently attacked (attackers know a CI system is often the "back door" into an organization -- it's less secure than the front door.)
    
    It serializes bash code into env vars:
    
    $ bash -c 'job_lint() { echo hi; }; export -f job_lint; env | grep BASH' BASH_FUNC_job_lint%%=() { echo hi
    
    So the child bash process is parsing env vars as code and executing it.
    
    Instead, I recommend what I call the $0 dispatch pattern, described in this post about xargs:
    
    https://www.oilshell.org/blog/2021/08/xargs.html
    
    This solves the same problem and is more general:
    
    How can xargs invoke a bash function?
    
    In this case, it's: how can nix develop invoke a bash function?
    
    I would use nix develop -c $0 job_lint and add a case clause job_lint) job_lint ;;
    
    That is, in the $0 dispatch pattern, the function name is the first arg $1. If Nix has a sandbox, this could interfere with $0, but there are usually ways around that. I'd be interested if it doesn't work
    
    Other such problems are:
    
    how can sudo invoke a bash function? (rather than putting sudo in front of every line)
    
    Some people have found this pattern confusing, but once you get used to it, it's very useful, and applies in many situations.
    
    Anyway the selfci project looks interesting ... in Oils our CI is more than 10K lines of shell and tests/benchmarks all sorts of things, so I definitely agree with the philosophy
    
    I think you do need control over the deps/environment -- we are using Podman now, but it of course inherits some design problems with Docker. (There were attempts to move to Nix, but porting 10K lines of shell that worked on Debian to Nix is non-trivial; instead we have a lightweight method of pinning dependent packages called "wedges")
    
    vhodges
    
    I took a look at this yesterday and it looks interesting, especially the lightweight merge queue.
    
    However it's not clear from the docs if it will do a pull before running the ci - to handle the cases where merges have been committed upstream since you last pulled. I am not sure github merge queue does this or not but eg bors-ng did do this.
    
    ficd
    
    This is really interesting, thanks for sharing.
    
    Melkor333
    
    This looks extremely interesting! I've been frustrated about the current state of CI for a long time and I think a move away from yaml is desperately needed. All the code people pour into their yamls makes my work sometimes miserable. Thanks for coming up with such a simple "unixy" API to outsource the logic! The idea with using the base branch is also super neat, a feature I've wanted quite a few times now...
    
    But I personally don't think running a CI on a shared system is an issue in itself. IMO running locally and on a shared runner is not a conflict. Especially in a team where a review by "anyone else" is OK (e.g. not a specific merger, but a 4-eye policy), I enjoy opening a MR-link and seeing the CI results directly withouth having to do anything locally. It can make small reviews much more efficient.
    
    It doesn't look like anything would prevent one from running selfCI on a shared system, e.g. by executing a job from radicle-native-ci (https://app.radicle.xyz/nodes/rosa.radicle.xyz/rad:z3qg5TKmN83afz2fj9z3fQjU8vaYE), or am I missing anything?
    
    dpc_pw
    
    100% agreed. The design is supposed to be local-first and that's a primary use-case that makes sense for it, but local-first trivially can run on a shared server.
    
    In a way SelfCI is just an idea and a proof of concept. The thesis is that the way CI is approached in the mainstream is turning a relatively simple problem into piles complexity and misery for misguided reasons. The idea behind SelfCI could be very easily adapted and/or re-implemented in various ways depending on people's needs and scale from a single-dev projects, all the way to enterprise software running their CI on computing clusters.
    
    rbr
    
    This looks very nice. Do I understand correctly that the daemon is optional? I.e. if I only want to run CI tasks on-demand (and synchronously, I guess), yet still want a kind of log that tell me which commits failed the CI and which commits passed the CI, can I do that with SelfCI? Or does the log/tracking feature require the daemon?
    
    dpc_pw
    
    The selfci mq is optional. If you never run it it will not start any daemon. The whole point of mq is just to queue bunch of things in the background. selfci mq check and selfci check will do basically the same thing, except mq will run it in the background and you can queue multiple things, so if your CI takes 10 minutes, you can enqueue bunch of things, and then check it later. The run that pass the mq can optionally get merged into trunk (selfci mq add vs selfci mq check).
    
    rbr
    
    That sounds like very good design. Does selfci also have an opinion on how to do secrets mamagement, or is that more ad hoc?
    
    dpc_pw
    
    I have plenty of thoughts on the matter, but SelfCI's design leaves it to the user, like almost everything else. :D . The Unix-philosophy (as I understand it) is: each tool handles well relatively small targeted functionality in a way that composes well with everything else.
    
    pmonson711
    
    This looks great! I've used https://www.ocurrent.org/ to have self hosted CI/CD. I'll have play with this and compare them.