Stamp It! All Programs Must Report Their Version
86 points by stapelberg
86 points by stapelberg
Outgoing HTTP requests: Include the VCS revision in the User-Agent
HTTP responses: Include the VCS revision in a header (internally)
This enables fingerprinting though. Can also leak information to attackers. I'd rather my software avoided doing this if at all possible.
Yes, isn’t this a really really bad idea?
Most security audits flag stuff like this as it’s an easy vector to target versions of software with known exploits.
Am I missing something?
(For clarity, I mean you usually don’t want to advertise that you are running php version 123 on Apache version 456)
The “internally” in parens may be doing some extremely heavy lifting here?
If your only defense against bad actors is that they don’t know you’re running a certain software version, I don’t think that’s much of a useful defense, only security theater :)
But you don’t have to follow my thinking on this! If you think the risk of fingerprinting is greater than the reward you get from observability, make your choice.
To me, starting with “User-Agent has full infos” and “HTTP responses have full infos” is the better default, and then for anything you make public you strip these details. But my thinking is colored by a big-corporation mindset here, and for small startups it might make sense to flip this default.
It is a useful defense when bad actors probe for vulnerabilities in things I don't currently run, and get blocked just for that!
On the client side, big corporations receiving useful info about the client leads to more bad things than good things (fingerprinting & targeting, short client allowlists…)
It's not the only defence, but it is part of the defence.
If it is possibly that malicious actors may know about vulnerabilities in the software before you do then telling them exact versions increases risk. Given there are multiple active markets for as-yet-undisclosed vulnerabilities this possibility is almost a certainty.
If malicious actors know about vulnerabilities in your git repository before you do, then yes, I would say you have a big problem :)
If they don’t have access to your git repository, what good is the VCS rev for them?
It’s not really a choice. Security scanning software is going to flag this and then you’re going to have to remove it. (large corps, government etc are usually required to use some security scan theater service).
Again, one of the first things they flag is advertising the software stack you’re running.
So an alternative might be to place this behind a secure endpoint, etc. much like your —version flag. Do you can request it instead of blast it out publicly.
If my past experience is anything to go by, security scan theater will scream that you are running a network. "Oh My God! You respond to ping! Oh My God! You machine has a DNS entry! Oh My God! You have a DNS server! Oh my God! You have a web server running! The End is nigh!" for pages and pages on end.
So, security via obscurity then? Isn't that a long-failed model?
It’s one of those very low hanging fruit security audit checks that they are going to put on the report bc it’s easy to find and easy to remediate.
At some point it gave the adversary useful information for an attack and so has become best practice.
Again, I’ve not seen a security audit that didn’t tell you to turn off the web server name and version (and php or asp etc etc)
Yeah, as I wrote previously, embedding version control revision IDs is a habit I picked up many years ago from BSD and Apache.
One thing I’m still refining is the logistics around cutting a release: in the past I had scripts that built source tarballs that contain version metadata files that don’t exist in version control; when building from a git checkout the version comes from a tag, and when building from a release tarball it comes from the version metadata file. This doesn’t work properly with git-archive, nor the web forge feature for downloading a release which is just git-archive over http. Now I’m adjusting my release scripts so that all the necessary files are committed to the revision that the release tag points to, so that git-archive works and can reproduce the release tarballs.
You may be interested in Git's keyword expansion attributes, which lets git-archive substitute the version into the archive content if configured by export-subst.
Make sure to use the full hash and not an abbreviated one, for reproducibility: https://dee.underscore.world/blog/export-subst-reproducibility/
+1
I wrote version.sh a while ago, where export-subst fills in variable strings to try and output the same "git describe"-style version string — whether running in a git checkout or a downloaded source archive from your favourite forge.
Thanks for sharing.
I agree that old Unix systems were way ahead of what we’re dealing with right now in certain respects.
For example, Plan9 included the source code in /sys/src, so you could recompile easily: https://9p.io/wiki/plan9/Compiling_kernels/index.html
In my distri research project (see https://michael.stapelberg.ch/posts/tags/distri/), I also make the source code of every program available lazily, transparently, see “Debug sources and package sources” in https://distr1.org/things-to-try/ :)
I find it interesting that you picked up the ident habit from BSDs, because the one interaction I had with BSD folks about versioning was the very opposite! The https://mandoc.bsd.lv/ folks are vehemently against adding a --version flag to their software, saying I should “trust my package manager”. That’s nice, but now they haven’t released a new version for years, so I am forced to use developer builds that I cannot identify…
BSD software usually doesn’t have a version option even when the version is embedded in the binary. The OpenBSD folks now have a different opinion than the FreeBSD folks did in the 1990s, in particular OpenBSD stripped out all their revision ID strings in 2009.
Huh! Thanks for that explanation. Do you happen to know the reasoning behind the OpenBSD decision to get rid of revision IDs?
Dunno beyond what’s in the commit message https://github.com/openbsd/src/commit/043fbe51c197dbbcd422e917b65f765d8b5f8874
Great tip about the better build information in Go 1.18! I didn't realize that. I've already started updating my projects to use that.
That sounds like the kind of problem Chalk is trying to solve: https://chalkproject.io/ and https://github.com/crashappsec/chalk?tab=readme-ov-file#software-provenance-and-attestation-made-easy (GPLv3)
Chalk gives you a cryptographically verifiable chain of custody from build through production, with minimal configuration, and no changes to how your software is run.
That’s nice, but orthogonal. My point is that having the VCS rev at the very end of your build pipeline is simple and incredibly valuable.
If you want to add cryptographically verifiable chain of custody, sure. But don’t over-complicate a simple concept when you don’t need to :)
As a Chalk developer, I do not think it is orthogonal at all. Making it easy to definitely identify the mapping between artifacts and their repos / branches is a primary use case for Chalk.
Not only do most of its users arrange their upstream pipelines for Chalk to just automatically do the tracking, it covers a bunch of corner cases. For instance, the repo URL and commit hash aren't enough to deal with instances where what's built and compiled has been changed after checkout; Chalk captures those changes.
And it make sure that, after you inject version info into the artifact, it's possible to determine if the artifact has been changed since the version info, which is particularly valuable with shell scripts or Python source files, for example.
It is more comprehensive. But it's actually SIMPLER to apply in practice, since an explicit goal is to try to eliminate pushing work onto developers-- several large F500s use it across thousands of repos where the developers have never lifted a finger to add it.
And the provenance you think is "over complicated" is a requirement for SLSA level 2 compliance, which is either an explicit requirement, or a de-facto requirement in many procurement processes these days, thanks to CISA. We were definitely tired of watching developers struggle trying to piece together a hodge-podge of tools in a way that was practical to operationalize.
But certainly happy to see when people DIY their own solutions, even if it's for a part of the problem.
Yeah I get it. We also have some provenance solution at work.
But really, my point is that you should try to get the VCS rev stamped in all scenarios. Not just in F500, or big companies. For your weekend project. For your one-off script. Just get your defaults right, it will be so useful :)
Couldn’t agree more. I need to add that printing the configuration file path is a good start, but I think many programs should consider their configuration to be part of the program version that’s running (certainly the ones whose configuration language includes conditionals).
On many occasions I’ve needed to inspect the configuration that the program thinks it’s running with vs. the configuration on disk too: in my newest example, zellij doesn’t see globally-set keys in maps that get emptied in the configuration. Would be nice to see if the loaded config or the interpreter is wrong there, but no, no facility to serialize the running config exists.
We should just not have to worry about the “I’m setting value x but the computer is showing behavior y” scenario, or at least have some confidence that we can figure out why.
Yes! This is the obvious next step.
I solve this by not only stamping the VCS ref of the software itself, but also the VCS ref of my NixOS configuration repo, and then it’s all fully declared :)
heh. I recently had trouble with our deployment at $dayjob. it was supposed to build and deploy our astro based static website whenever we pushed to the production branch. this seemed to work most of the time but sometimes people complained about things that should have been fixed already. so I wanted a way to verify which build was actually live.
took me a bit to figure out how to pass git commit id through gitlab-ci into the container building the site into astro to then write it in a meta version header. but I will probably push for making that a standard for all our software.
The blog post presents go install ...@latest as inferior because it doesn't include a dirty flag, but that's because it doesn't need to - it downloads the source direct from the module proxy or VCS, verifies that its module checksum is published in the sumdb, and builds out of the module cache, which is read-only to protect from accidental modifications. It then stamps the module version and checksum into the binary. With go install ...@latest you're very likely to building the correct, published source code for the version that's stamped in the binary.
On the other hand, when building from a local git checkout, there's a risk of accidental modifications altering the build without vcs.modified=true being set. For example, a gitignored file can contribute to the build.
That's why I only use go install ...@latest for building release binaries - it feels like much better hygiene. It's true it doesn't stamp commit ID and timestamp, but I don't see what value they add when you have the module version and checksum. If you need that information you can always query the module proxy: https://proxy.golang.org/src.agwa.name/snid/@v/v0.4.0.info
The blog post presents go install ...@latest as inferior
That was not my intention. Where did you pick that up from, specifically, please? I’d like to change the wording to make it clear that installing from Go modules also gives you a date and revision info (just not with the same precision).
The Nix tension between reproducibility and VCS stamping is kind of the same issue as ko has here: https://github.com/ko-build/terraform-provider-ko/issues/285
We use the X build flag to bake in the output of "git describe --tags --always" in our microservices. It's essential to include version info in logs, etc.
So could you drop the bespoke X build plumbing in favor of VCS stamping? (Or does anything stand in the way?) That way, you don’t need to do per-service work.
The is useful for things that Go doesn't stamp by default, for example it can inject a {branch}/{commit_number} that is monotonic-ish and easier to read than a commit ID. And once you've done the (trivial) setup it's not really any extra work to keep it around + there's not any per-service work, just a stamp = True (or equivalent) in your build config.
--
// base/go/buildinfo/buildinfo.go
package buildinfo
var (
_GIT_BRANCH = ""
_GIT_COMMIT_ID = ""
_GIT_COMMIT_NAME = ""
_GIT_TAG = ""
_GIT_TREE_MODIFIED = ""
)
func GitBranch() string {
return _GIT_BRANCH
}
func GitCommitID() string {
return _GIT_COMMIT_ID
}
func GitCommitName() string {
return _GIT_COMMIT_NAME
}
func GitTag() string {
return _GIT_TAG
}
func GitTreeModified() bool {
return _GIT_TREE_MODIFIED == "true"
}
--
# base/go/buildinfo/BUILD
load("@io_bazel_rules_go//go:def.bzl", "go_library")
package(default_visibility = ["//visibility:public"])
go_library(
name = "buildinfo",
srcs = ["buildinfo.go"],
importpath = "_/base/go/buildinfo",
x_defs = {
"_GIT_BRANCH": "{STABLE_GIT_BRANCH}",
"_GIT_COMMIT_ID": "{STABLE_GIT_COMMIT_ID}",
"_GIT_COMMIT_NAME": "{STABLE_GIT_COMMIT_NAME}",
"_GIT_TAG": "{STABLE_GIT_TAG}",
"_GIT_TREE_MODIFIED": "{STABLE_GIT_TREE_MODIFIED}",
},
)
--
# workspace_status.sh
#!/bin/sh
git_branch=`git rev-parse --abbrev-ref HEAD`
git_commit_id=`git rev-parse HEAD`
git_commit_number=`git rev-list --count "${git_branch}" --`
git_tag=`git tag --points-at HEAD`
echo "STABLE_GIT_BRANCH ${git_branch}"
echo "STABLE_GIT_COMMIT_ID ${git_commit_id}"
echo "STABLE_GIT_COMMIT_NAME ${git_branch}/${git_commit_number}"
echo "STABLE_GIT_TAG ${git_tag}"
git_tree_modified=`git diff-index --quiet HEAD -- ; echo $?`
if [ "${git_tree_modified}" = "0" ] ; then
echo "STABLE_GIT_TREE_MODIFIED false"
else
echo "STABLE_GIT_TREE_MODIFIED true"
fi
it was a little tl;dr so I asked gemini to summarize:
In his article, "Stamp It! All Programs Must Report Their Version," Michael Stapelberg argues that modern software development suffers from dangerously low versioning standards, particularly regarding "developer builds" or unreleased code. He highlights how vague versioning leads to wasted time during bug reporting and troubleshooting—such as when a user unknowingly runs an outdated build or an incorrectly installed binary. Using the i3 window manager and Go as case studies, he demonstrates that every program should "stamp" its Version Control System (VCS) revision directly into the binary at build time. This ensures that regardless of how or where the software is deployed, its exact source code origin is always identifiable, making debugging and maintenance significantly more efficient.
The "Stamp It!" 1-Page Guide 1. The Problem: The "Version Unknown" TrapSoftware is increasingly complex and ephemeral. Relying on simple version numbers (e.g., v1.2.0) is insufficient for modern development because:
Every binary should carry its own metadata. Stapelberg proposes a three-step action plan:
A. Stamp It! Include the source VCS revision (e.g., Git commit hash) inside the program.
debug.ReadBuildInfo().-X in Go or D in C/C++) to inject the commit hash during the build process.B. Plumb It! Ensure that the build pipeline (CI/CD, NixOS, Docker) doesn't "clean" the VCS information before the build.
.git folders are often removed for purity. Developers must use "overlays" or specific build flags to pass the revision into the build environment.C. Report It! Make the information accessible across all interfaces:
--version and a more verbose --moreversion.User-Agent strings for HTTP requests or custom headers in responses.Version reporting shouldn't be an afterthought. By stamping, plumbing, and reporting version info, teams eliminate the guesswork from software operations and provide a "stamped metal plate" of authenticity for every piece of code they ship.
All of my CLI apps (that I create) respond to --about as a one-liner that contains a version and a short description. Sometimes a self-hash.
Same idea basically.