Zig Community Mirrors
17 points by kristoff
17 points by kristoff
What’s the reasoning behind the mirrors needing to dynamically handle requests?
https://github.com/ziglang/www.ziglang.org/blob/main/MIRRORS.md#community-mirrors
Yeah, this is odd.
Without the requirement to do special processing on requests, you could host a simple mirror on any dumb static file server. The requirement to do server-side processing on requests drives up the cost and complexity of mirroring by at least an order of magnitude.
Why do Zig mirrors require more complexity than Linux distro mirrors? The two seem to do the exact same thing.
New builds of Zig’s master branch ship around once a day (give or take), and a lot of people still use those builds at this time in Zig’s develoment. If a mirror were a simple static HTTP server, it would pretty much always be out-of-date, so a huge chunk of requests sent to it would just 404, making the mirrors much less effective (and also annoying users, making them less likely to use mirrors). If a mirror refreshed often enough that it has every master tarball – so requests usually succeed – then it would gain around 1.5 gigs of tarballs for every new master build, so around 1.5GB a day. That’s a bit over-the-top, particularly since some targets are much less likely to be fetched, so eagerly caching them all is rather excessive.
I honestly don’t think on-demand fetching is a particularly big demand, and I think “[it] drives up the cost and complexity of mirroring by at least an order of magnitude” is a particularly questionable claim. It’s true that this requires something a little smarter than a dumb static host, but… not that much smarter. We’ve already got a few mirrors set up without much fuss.
I honestly don’t think on-demand fetching is a particularly big demand
I said custom server-side request processing is the issue, not on-demand fetching. I agree on-demand fetching is fine.
Without the requirements for parsing filenames and re-routing the request based on that, I could just point a CDN (e.g., Bunny) at https://ziglang.org/download/
and my mirror would be complete. Bunny and any CDN knows to do on-demand fetching already and has lots of options for managing the cache.
The thing preventing this is the requirement that mirrors have to parse the filename and then decide whether to fetch the file from https://ziglang.org/download/<version>/<filename>
or https://ziglang.org/builds/<filename>
.
Why push this complexity to every mirror to solve independently? Why not serve the authoritative tarballs in such a way that the mirror can just pass the request path along to the authoritative server as-is without doing Zig-specific rewriting? Wouldn’t that allow anyone to trivially set up a Zig tarball mirror with any CDN provider?
I think your question about the extra complexity is completely reasonable.
That said, CDN providers can be used to serve a Zig mirror and more specifically both Bunny and Fastly already have been used to do so:
Guides:
Ahh those are good links. TIL that Bunny supports custom middleware scripts and that they are adequate enough to implement the mirror requirements.
Cool, thanks for those links! It looks like the Fastly one doesn’t work, sadly, but the Bunny one looks nice.
I just submitted a PR to give the Bunny guide more visibility.
That said, it’s worth noting that the rewrite logic drives up costs for mirrors, as they have to pay for edge scripting as an additional service, whereas simple request path forwarding would be free.
The issue with the Fastly one is an upstream Zig bug in its TLS client. The mirror itself is actually fine, it’s just the automated “check mirror is working” tests which struggle. Once that bug is fixed in Zig we’ll be able to merge that mirror.
New builds of Zig’s master branch ship around once a day
Yea, that’s roughly 50-100x more frequent than your average stable distro package, I’d wager.
Seems like they intend for mirrors to act as CDNs that pull and cache on demand rather than keep a complete copy of the release history? Does seem a little confusing because I think of mirrors differently because of how distros use the term.
Yeah, this does read more like a caching proxy than a mirror. I only know Nginx, but it seems possible. Just need to do some regex matching to correctly route the prerelease builds to the separate backend.
No, mirrors are expected to keep release history, but a user might ask for a version of Zig that the mirror hasn’t fetched yet (e.g. when requesting unstable builds of master branch) in which case the mirror is not allowed to return 404 not found, and must instead dynamically fetch the tarball from ziglang.org.