There Is Life Before Main in Rust

42 points by mmastrac

ssokolow

Go is a notable exception in that it avoids the C runtime on most platforms, but Apple requires a C runtime to access syscalls.

Apple uses libSystem.dylib as the ABI stability boundary for syscalls, NT-lineage Windows has ntdll.dll as the ABI stability boundary, not syscalls. ~~The BSDs use libc as that boundary.~~ On OpenBSD, I believe Go sets some kind of "opt out of NX-bit enforcement"-esque metadata flag to opt out of having the kernel kill it for attempting to syscall from a location outside of the read-only libc mapping the loader set up.

EDIT: To clarify, libSystem.dylib contains the functionality which would normally be libc.so plus other things so, in that respect, it's the same BSD-verse "libc is the stability boundary" dance.

EDIT: As of Go 1.16, Go now uses libc on OpenBSD to comply with their syscalling policy.

Linux is ~~the anomaly~~ [uncommon] in having stable syscall numbers instead of a "piece of the kernel that gets loaded into process address space as a dynamic library and shares an unstable enum definition of syscalls with the kernel-mode code" because Linux and glibc aren't developed together in the same repo the way everyone else does it.

There’s an entire ecosystem of processing that happens before the function you declared as main starts up. C uses this to configure allocation, file access, thread-local storage and other C runtime services. Rust uses this time to configure parts of its own language and runtime. Specifically, Rust has infrastructure to handle panics and unwinding. Rust also needs to translate the C-style program arguments into its own std::env::args interface.

On Windows, the C runtime is also responsible for parsing the CP/M-style command string that MS-DOS copied (and Windows's subprocess spawning APIs continued) into a POSIX-style argv array. That's why Python's subprocess module documentation has a section named Converting an argument sequence to a string on Windows about how it will convert your argv array to a string following the quoting rules baked into the MS C runtime, which the invoked subprocess's own parser can deviate from if it so chooses.

On Linux, this hook is usually named _start and the linker automatically adds whatever symbol has that name to the binary.

Not quite. If an ELF-format binary is an executable rather than just a library, the e_entry field in its header (offset 0x18) contains the address for the loader to jump to after setting it up in memory. _start is GCC's convention (which things like NASM copy, IIRC) for how you specify what e_entry should point to when you opt out of libc providing it for you.

A similar hook exists on Windows, and boots the executable in a function named _WinMainCRTStartup. At this point the C runtime has a chance to configure itself, and the way that all runtimes do this is via initialization functions.

Which the loader finds via AddressOfEntryPoint in the PE header. Offset 0x0028 from the start of the PE header, which comes after the MZ (DOS EXE) header and DOS Stub.

EDIT 1:

Making the smallest Windows application and then Tiny PE are a good way to learn more about the ins and outs of PE headers through the vehicle of their authors figuring out how they can make smaller executables. (Tiny PE violates the PE spec in accepted-by-Windows ways such as overlapping stuff where it knows the OS won't read one of the things being overlapped and stuffing code into unused header fields... but if you go this far, the smallest file Windows will accept is dependent on which Windows version you run it on.)

EDIT 2: OK, done.

mmastrac

Thanks - that's a very useful clarification. I had though it was officially libc on macOS. I'll digest and integrate these clarifications if that's alright with you.
- ssokolow
  
  To be honest, I'm still reading. I started responding in-situ since any clarification which comes later in the post would likely be missed by someone skim-reading for want of a cross-reference, so I considered it a reasonable time to respond.
  
  I'll add an EDIT: boundary for stuff added after [I noticed that] you replied.
  
  EDIT: Oh, and yes. Feel free to integrate what I wrote.
fanf

Re _start, on a.out systems the entry point from the kernel to an executable was traditionally called start as declared in csu/crt0, eg 7th edition, VAX BSD. In that era the C compiler stuck a _ on the front of its global symbols so you can see V7 declares _main and BSD declares the asm name for C start() as unadorned start. In that era a program started at the beginning and cc’s linker invocation arranged for crt0 to come first. (csu = (lib) C startup, crt0 = zeroth C runtime support object)

It’s harder to find out exactly how things worked in System V where ELF came from, but start or _start continued to be the program entry point declared in csu/crt0. I have never bothered to properly understand how ELF changed _ prefixing: I think they added another layer of it for funsies or something? Which caused start to become _start for some reason?

I think it was ELF that added the obvious counterpart _end which corresponds to the top of the BSS, i.e, what sbrk(0) would return before malloc() creates its heap.
fanf

The BSDs use libc as that boundary.

FreeBSD and NetBSD syscalls have ABI stability, as well as their system libraries.
- matheusmoreira
  
  Can you provide citations for this? I've written about the subject years ago. While researching the subject, I found a lot of conflicting information when it came to the various BSDs. I've seen people claim they have stable system calls but back then I found forum posts and mailing lists describing the opposite.
  
  Linux documents this in the repository itself. Linus himself is on the record saying it. Are there equally authoritative promises of system call binary interface stability at the instruction set level from the BSDs?
  - fanf
    
    https://cgit.freebsd.org/src/tree/sys/conf/NOTES#n330 https://cgit.freebsd.org/src/tree/sys/amd64/conf/GENERIC#n62
    
    https://www.netbsd.org/gallery/presentations/joerg/asiabsdcon2016/asiabsdcon2016.pdf
    
    mort
    
    I don't see how any of those links contain an authoritative claim that it's the intention of the FreeBSD or NetBSD projects to keep the syscall numbers and ABI stable?
    
    The PDF you link says that NetBSD has kept stability, but that's an observation about history, not an authoritative statement of intent.
  - ssokolow
    
    Huh. It appears you're right for FreeBSD... at least if they follow this wiki page well, and NetBSD does appear to follow the same philosophy... but both were buried under a flood of contradictory information. With that and hazy memories of Rust issues like #92466, no wonder I was mistaken.
    
    Correction memorized.
- mmastrac
  
  I've been interested in life-before-main in Rust for a while and thought it would be useful to put it all together into a post that explains what it is and why it's useful. I've got some thoughts on future posts along these lines, like how you can build faster collections that make use of linker aggregation, but I'd love to hear feedback on this first "intro-focused" topic.
  - wrs
    
    I’ve been doing a lot of embedded (thus no_std and sometimes even no-alloc) Rust, where main is just another function, and initialization is largely up to the developer. There’s quite a bit of hand-rolled boilerplate in the codebase for similar use cases, so I’m curious how these crates relate to that environment.
    
    mmastrac
    
    Assuming you are using LLVM's or GCC's linker, all of these crates should work identically, though you'll likely need to manually configure your linker script to set up .init_array and .init_array.NNNNN properly and add a function to iterate over them. The orphan section start/stop symbols are platform-independent magic with those toolchains (including platformless embedded!).
    
    Each of the crates in linktime should support both no_std and no_alloc out of the box (although I probably need to test to ensure scattered-collect compiles - it does not require either std or alloc, but it's just "untested").
    
    I find link-time aggregation of data to be extremely useful and it's nice to avoid all the boilerplate that the post shows for a few more complex cases. I'd be happy to discuss the particular usecases you have in mind and see if there's a good way to apply it.
  - lcapaldo
    
    It’s a highly-ordered, highly-controllable environment that lets you more confidently do a lot of work without locks, atomics and other synchronization primitives
    
    The body of main is an even more ordered, even more controllable environment that lets you confidently do a lot of work without locks, atomics and other synchronization primitives. Well it is at least as long as nobody starts undermining those properties by putting code else where that runs before it.
    
    One advantage we have in doing work before main is that it is well-behaved. No threads are running unless we start them.
    
    This is a perfect example, no threads are running in main until we start them either, except of course if we start breaking the no life before main guarantees, and now there can be, and unlike threads started from main they’re not obvious in the code.
    
    Runtimes make use of this pre-main phase because it guarantees (1) running before user code, and (2) a single-threaded, highly-consistent and predictably-ordered environment, which allow for reliable and deterministic initialization
    
    This is putting a misleading emphasis IMO. The runtime isn’t making “use of” the pre main phase, it is the pre main phase (among other things), it is what calls main.