The Atrocious State Of Binary Compatibility on Linux and How To Address It
27 points by calvin
27 points by calvin
We think a better approach would be breaking GLIBC into distinct libraries, something like this:
- libsyscall – Handles making system calls and nothing else. This is provided as a static library only. Used by libheap, libthread and libc to gain access to shared system call code. Since it’s static it’s embedded in all three. You can pretend this library otherwise does not exist.
[…]
- libthread – Deals with threading and TLS, links against libheap. Provided only as a dynamic library. Cannot ever be linked against statically.
It isn’t that easy, I’m afraid. System calls depend on TLS: the classic one is errno, but the vdso interfaces are also thread-local: the per-core rate/phase parameters for clock_gettime()
, the per-thread getrandom()
state. Probably other things I don’t know about!
The syscall interface itself doesn’t depend on TLS, and libsyscall wouldn’t have any specific ABI to uphold besides the one it defines itself (with its own stability guarantees, or lack thereof). Why not have it return the -errno
values like the kernel does? AFAICT, the proposal is for libsyscall
to be an extremely thin wrapper around the interfaces the kernel exposes. I don’t see why this library needs to be any more thread aware than the kernel ABI itself is. Am I missing something obvious?
Extremely thin, as in stateless, yes. The lowest level libraries need to be free of state to be free of static linking problems. Not that stateful interfaces are forgivable to begin with anymore in this day and age except as convenience wrappers.
Yes, I thought that when they were describing their approach it seemed relatively principled. In particular, what they call “relaxation”, which is really just treating all compilation (correctly!) as cross-compilation using an appropriate sysroot for the (oldest supported) target machine.
It gets a bit odd once they start recommending things, though. The crux of the issue seems, honestly, to be that glibc has serious quality control issues. Splitting the library up but keeping the maintenance behind it the same feels like it will still just result in quality issues in the future, but moved around a bit.
In illumos, we don’t have a static libc at all: it was thrown out in the distant Solaris past as being essentially unsupportable. Ours has been binary compatible for decades, though, at this point. 2038 will obviously see the end of the 32-bit ABI, but I don’t really expect we will ever incompatibility change the 64-bit libc.
glibc has serious quality control issues
They mention three bugs but
in one case glibc stopped overriding the distro default to go the extra mile providing a data structure that had been unused for 16 years;
one was related to compiling on new glibc and running on the old one (i.e. forwards compatibility, not backwards) which is not supported, and was fixed anyway once reported;
the only one that really was a backwards compatibility breakage was not guaranteed to work even before and the previous behavior had security consequences.
I don’t think that’s “serious quality control issues”.
FreeBSD has this separation (the first one might be new since you last looked, Brooks did most of it):
The first of these is machine generated from the syscalls table (which also generates the kernel parts).
The C library provides errno. You don’t need the threading library for initial-exec TLS. This is set up by the static linker and C support stubs. It gets more complex when you use dlopen
because then TLS ends up being an NxM thing with blocks per library and per thread and you need to have a ‘give me TLS for this library’ API (which global-dynamic defines).
For the first thread, there’s a .tbss
section that is mapped from the initial ELF (or, with dynamic linking, from the initial set of shared libraries, including libc). When you link libthr, you get the ability to create the second thread.
Notably not considered: Nix. Nix would give you what you want from containers (the binary and ALL its dependencies) and none of what you don’t want from them (runtime isolation which you need to work around).
They can’t assume their users are running Nix, that’s a no go. They could at best use Nix to help them build the static executable, except last I tried lots of Nix packages failed to build statically.
The target system doesn’t need to have Nix. You can just ship the store paths.
I guess that’s true, but then the store paths only work if the environment is setup correctly right? Like if I manually run a binary directly out of the nix store is it guaranteed to not need any env vars set to find depths? Is everything RPATH?
It’s all in the store, you can just run the binaries directly as they are. Yes, dependencies on dynamic libraries are all handled via RPATH (or RUNPATH).
Not to toot my own horn too hard, but my article on portable, dynamically linked packages on Linux (discussion) addresses most of the challenges mentioned: no containers, but still allows shipping glibc and the dynamic linker along with the application. This approach also has the advantage it works on non-glibc distros like Alpine and NixOS, and even in a FROM scratch
Docker environment. The biggest disadvantage is just that it’s a fairly complicated solution, if not a little hacky.
To be honest, I still haven’t tested my approach with GPU drivers or similar yet: I’ve still got a lot of foundational packages left before I start looking at things like Vulkan/OpenGL, let alone CUDA. But fundamentally, there’s no reason you can’t ship some dynamic libraries while simultaneously dynamically loading others from the system when using this approach. (Of course, since I haven’t tested it and there’s no tooling for it, that also means it’s little more than a “neat thought experiment” right now for anything other than CLI/headless programs…)
This approach also has the advantage it works on non-glibc distros like Alpine and NixOS, and even in a FROM scratch Docker environment.
If I copy every linked library together with the program and run it on different system, it’s hard for it to not work due to missing libraries.
The dynamic linker itself is part of glibc, and each ELF executable reference the dynamic linker by absolute path. To get things to work on non-glibc based distros and distros using an older version of glibc, you need some way to use the dynamic linker from the right version of glibc too.
You could pick an arbitrary path to install a secondary bundled version of glibc, but it’d need to be the same path across all installations, meaning the installer would need to run as root. So, the novelty is to bundle up the dynamic linker too along with some glue code to get it to all run path-independently (meaning you can drop the whole bundle in your home directory, so no root permissions required during installation)
Since our application relies on many non-system libraries that may not be installed on the user’s system, we need a way to include them
statically linking everything we can
Or just bundle the library code in your application; There’s no reason you need to ship your entire program “static” just to have a version of SDL embedded in it that can still talk to libvulkan.so.
subprocess.run(['apt', 'install', '-y', *PACKAGES], env=env)
Oh dear. Listen you really don’t want to do this. Despite it being possibly illegal to distribute the results, it’s also a good way to get your whole shebang compromised.
I would think it would be atrocious if I had ran into this constantly, but here I am only having had to solve this problem a couple of times in my 26 years of Linux usage.
Yes, it’s not windows but that title just sounds clickbaity to me.
Zig makes it trivial to produce libcless binaries. Too bad to interact with GPUs you still need to link to libc. https://ziggit.dev/t/loading-libvulkan-so-1-on-linux-with-std-elfdynlib/8480
Shared libraries that don’t depend on libc could be opened via dlopen/dlmopen, or e.g. by manually parsing the ELF and loading the relevant sections into memory and dlsyming.
Alright, yeah, this is not a good solution
Because containerized applications are isolated from the system, they often feel isolated too. This creates consistency issues, where the application may not recognize the user’s name, home directory, system settings, desktop environment preferences, or even have proper access to the filesystem.
This is not a property of containers. It’s a property of the security sandbox that containerized app systems impose because sandboxing is a really good idea. For example, most of these problems go away with --filesystem=host
. This is a feature, not a bug.
It’s not a feature, it’s an unnecessary consolidation of conceptually unrelated features. Sandboxing should be independent of the deployment mechanism, and a deployment mechanism should not rely on sandboxing features.
Bubblewrap can be used independently of Flatpak. I understand the argument that sandboxing should e.g. be declared in .desktop
files, but purity aside I think there’s a lot of practical merit to coupling the sandbox implementation to the distribution mechanism. Primarily: because it’s new, you can make sandboxing opt-out, not opt-in. Like it should’ve been to begin with.
Instead, we take a different approach: statically linking everything we can
It might be interesting - we have several thousand packages that we can build statically, and a Linux distribution on top of that - https://stal-ix.github.io/
I miss the days when static linking was common. I still have 20+ year old Linux binaries I can run.
The problem here is really about shipping closed source software on */Linux platforms. For most software that doesn’t fall into this closed-source special case, you just get the distribution to build it and provide the binaries.