Why does C have the best file API?
30 points by robalex
30 points by robalex
mmap(2), the API in question, is a system call; it’s not specific to C in any way. I’m sure the memory model of some other languages makes it more difficult to use mmap(2) in this way, but, still, there exist plenty of languages where you can write practically the same code. I believe it’s somewhat misguided to make this a C thing.
It's not even a particularly good API, as there's no sensible way to handle read/write failures other than a signal handler.
It’s worth keeping in mind that code is loaded using mmap() so there’s no way to handle a read failure while a program is running other than a signal handler. (Good luck if your signal hander is paged out and suffers from the same read error!) Like when running code, it’s often sensible to treat a read error from an open file as an unrecoverable failure, and in that situation mmap is fine.
Indeed. I just used mmap extensively in Rust code last week.
Rust is one of the language where getting mmap right is the hardest imo, as any change to the underlying file will end up being UB in almost all cases.
As if it's not UB everywhere else…?
It is pretty hard to decipher what you mean under the curtness of your reply. That being said, yes, Rust references have much more invariants than C pointers. Writing to a file you have already opened in C is UB only if it causes a data race, ie when concurrently reading the file. In Rust, merely having a &[u8] to the mmap is UB if it is written to concurrently,
In Rust, if you're using a shared file-backed mmap, then you simply cannot use references as the data is inherently mutable. Having a &[u8] to this data is technically UB regardless of concurrent writers, because the data could be mutated, you just won't notice if there are no actual mutations done to it. But you can use raw unsafe pointers with this just fine, and that leaves you with the same expressive power as C.
Having a &[u8] to this data is technically UB regardless of concurrent writers, because the data could be mutated
Have you got a formal reference for that? To me it sounds like a fundamental misunderstanding of how undefined behaviour is generally defined, though I'm certainly not a Rust expert.
Normally in C/C++ you only get UB when an erroneous action actually occurs, not just when it could occur: dereferencing a pointer that might have been (but isn't) null is ok (barring any other reason that it's not), and more pertinently, a data race (and related UB) only occurs when two threads actually non-atomically access the same data with no synchronisation, not when you just read/write through that some pointer that another thread theoretically could also have performed access via, even though no thread has.
Yes, the correct technical term is that having a &[u8] to a mmap file is always unsound (Note: at the executable level, instead of the more common discussion of unsoundness at the library level). Your program can be executed in an environment that makes this UB and in an environment that carefully avoids triggering the UB.
How is this different in theory from running in an environment that where another process might modify your processes memory (e.g. through ptrace)?
It's not, and doing such a thing is UB - as it would be in C. Rust has to draw the line somewhere, otherwise pretty much all IO calls would have to be unsafe.
In Rust, as a rule (although it is not the true rule), if a pointer is illegal to dereference, the equivalent reference is illegal to construct. It is illegal to possess a null reference in the way it is illegal to dereference a null pointer; it is illegal to possess a mutable reference derived from immutable data in the way that it is illegal to actually mutate it behind a pointer. However it is not illegal to create that pointer, this rule does not apply transitively to other pointers aliasing the data. The only rules about raw pointer construction (which the filesystem mapping is like, here) relate to ptr::offset.
if a pointer is illegal to dereference, the equivalent reference is illegal to construct
This is true in C++ also (at least in the case of null pointers). My point was more subtle; lilyball is claiming that in this case it is UB to construct a reference to data that could theoretically be mutated "from outside", even though it's possible (outside of the language) to assure an environment where that won't happen. How would such a rule even be phrased?
At that point, the program's correctness guarantees rely on more abstract properties of the system that can't possibly guaranteed from within the process itself. Use an unsafe block, document the invariant well, and move on. It's not much different to the whole "accessing /proc/mem doesn't need an unsafe block" thing.
At that point, the program's correctness guarantees rely on more abstract properties of the system that can't possibly guaranteed from within the process itself
That is my point. It's not automatically Undefined Behaviour; whether or not you get UB depends on what happens in the system.
I don't think it is right. For example it is not UB to read through a shared reference while you have a raw mutable pointer to the same data, as long as you don't write through the raw pointer. It is the write itself that creates the UB, not the potential for it.
What you describe however is true for mutable references : if I turn this same raw pointer to a mutable reference, the UB (currently, iirc there are discussions to relax this) happens as soon as you create the reference, because merely having a mutable reference while accessing from elsewhere is UB.
NB: the aliasing model is not exactly defined so this may change in the future.
It’s UB in C unless you use atomic or volatile reads. The C abstract machine says that the contents of memory doesn’t change unless a potentially aliasing pointer has a store. If a pointer is not atomic-qualified, the C abstract machine assumes data-race freedom and it is UB if there are concurrent writes. If a piece of code reads the same word of an mmap’d object twice, the compiler can (and frequently does) fold the loads. If another process writes to the file, that write may or may not be observed in the C abstract machine. And loads may be arbitrarily reordered with respect to the write, so you may see any intermediate bit pattern.
Totally, but if you manage to synchronize the writes and reads between the process in C, I believe it is much easier to get back to something sane. For example, if I remember correctly, if you write to the file, release a mutex, and then acquire the same mutex in another process that had the file you 've written to mmaped, the mmap is supposed to have the new file content (provided the length did not change). If I understand correctly, all pointers you had to this object in C are still valid of you didn't try to access them while you didn't have the mutex. In rust, all references would have been invalidated.
If I understand correctly, all pointers you had to this object in C are still valid of you didn't try to access them while you didn't have the mutex.
As David says, this is only true for pointers to volatile/atomic pointees: because the volatile attribute forces the compiler to consider any accesses to immediately be stale once performed.
The same is also true in Rust, but because Rust doesn't have volatile types and instead makes volatility a property of an access performed on a raw pointer, you have to use raw pointers.
Note that this isn't because Rust references have different properties than C pointers (at least, in this context): it's purely a product of the fact that there is no such thing as a volatile pointee in Rust. I do believe, however, that if you have a slice like &mut [MaybeUninit<T>] and only perform volatile access to each of the elements, that's entirely sound to have point at mapped memory (provided you're syncing access to avoid data races, ofc), demonstrating that it's not the reference, but the referencee, that is the reason for the difference.
I don't want to come off as stubborn but:
As David says, this is only true for pointers to volatile/atomic pointees: because the volatile attribute forces the compiler to consider any accesses to immediately be stale once performed.
I don't think it is if -- as I have stated -- you have created an happens before relationship via an interprocess mutex.
I do believe, however, that if you have a slice like &mut [MaybeUninit<T>] and only perform volatile access to each of the elements, that's entirely sound to have point at mapped memory (provided you're syncing access to avoid data races, ofc),
This is UB and violating the aliasing rules for references, even if you have a MaybeUninit.
What may be sound is to have a &[UnsafeCell<u8>] (or anything built on unsafecell giving it interior mutablity), and perform volatile read. but even then, because this mutation happens across process, it is very likely to fall under the rule highlighted here https://doc.rust-lang.org/std/ptr/fn.read_volatile.html for volatile read to memory in an allocation and not cause CPU caches to sync. The more likely thing is that you need &[AtomicU8] to be sure. This may be the same in C, where volatile does not protect against data races TTBOMK.
My issue with any of these API is that as soon as you give access to a &[u8] of the mmaped memory, which you are virtually forced to do to call any useful method, if you are not extremely cautious in your lifetime pinning and inter proccess syncing, you are creating unsoundness.
Luckily, that shouldn't be an issue in my use case, because by design the mmap refers to constant data on disk. Updates switch to a new mmap handle, with a distinct file path. I'm paranoid about that kind of thing because several years back I hit a bug (in C): I discovered that since dlopen only identifies files by path, if you replace the file (or symlink, IIRC) you can end up with some of the old file getting paged out and eventually replaced with the new file's contents at the same offset. With that dynamically linked code it led to jumping into the middle of functions with stale register contents and extremely strange errors. The dlopen behavior is noted in the man page, but I probably figured it'd resolve the file path and then save some kind of inode / file handle internally.
That’s a surprising bug! What system were you using?
At my previous job, we had the same issue with Linux, Mac OS-X and Solaris. We had a program that would mmap() a large data file as read-only, and if we wanted to update the file, it first had to be deleted, then notifying the program to re-mmap() the file again. If we didn't do that, it was a hard crash.
The bug was a production system at work, so I probably can't go into more detail than that.
We made a trivial standalone program using dlopen similarly, and were able to duplicate the failure mode under increasing virtual memory pressure: pages of the old and new versions of the library became interleaved in memory, then it would trigger assertions for unexpectedly having the older library's register contents but suddenly ending up in the middle of the newer library's functions.
or the worst API: https://db.cs.cmu.edu/mmap-cidr2022/
Your I/O error handling is now SIGBUS. You lose control over when I/O happens, and your performance and latency is at the mercy of complex memory swap machinery and TLB management.
Yeah, mmap needs to be used with care. previously
The counterargument is that it is possible to mitigate the downsides of mmap without the complexity described by Crotty/Leis/Pavlo: previously, previously. In short, use mmap for the read path and page cache, and use write(), fsync(), etc. for the write path.
The true MVP for file apis is preadv and pwritev: Vectored access to files with a base position, and an array of sources/targets
I really never understood what was the actual point of this API. Is it optimized on the kernel side to write into the different vectors/buffers in parallel? Is it interesting for the caller to get it's data split into different buffers for parallel management in return? What is a typical use case for this API?
I think the main purpose is to reduce syscall overhead. E.g. you mentioned LMDB elsewhere in this thread, and it has a compile-time option to use pwritev for flushing the dirty page set.
The big insightful answer to this question is here: https://www.humprog.org/~stephen/research/papers/kell17some-preprint.pdf
This is article "Some Were Meant for C". It explains what is actual difference between C and nearly all other languages. (No, it is not about performance.)
It's not like it's "built in" to the stdlib (but then C's stdlib is notoriously slim), but you can easily roll together a FileArray abstraction in Nim. As others have pointed out several times in several ways in this very thread, the safest route is to write via regular writes and then read via the mmap. So, it depends how "read heavy" vs. "write heavy" your workload is. Presumably there are Apache Arrow wrappers in many PLangs (which it seems does not let you do a whole struct, but only the parallel scalars route?). Or HDF5 with memory maps. So, maybe the author is just under-exposed?
I don't understand the comment about not needing to parse? Either your data is structured and needs to be parsed or it does not - why would how you read from disk change anything?
I think they're saying you can just put structs in mmaped files and use them as-is.
Yep..... until one of those structs needs to reference another struct.... oops
You can use offsets instead of pointers. That's how many file formats do it.
EDIT: E.g. take a look at elf(5).
yeah, it's a super common idiom - hell it's common in memory as well just because pointers are larger than offsets.
Of course you can, but at that point you're serializing and deserializing, exactly what the original article wanted to avoid by not using a database
The structs need to be designed so their layout works live from a memory mapped file. It obviously won't work when the structs have pointers (unless there's some sort of page fault pointer swizzling), but can be a good fit for flat tables of structs. They can refer to each other by an offset (relative to the table start), or by a table row ID if the structs have a fixed size.
Yeah, but you can get very far when it comes to reading/writing page headers for key value stores for example. And even if you need to reference to somewhere else in the file you can fix the mmap to a given address, I think. At least that's what LMDB's devs started working on...
...but you can do that with any binary interface? I have a lot of questions about how this author imagines non-mmap interfaces work (let alone are implemented) :D
oh, I had to just look up the post again - this kind of file format is what has got msvc stuck with https://lobste.rs/s/ttkuj8/c_enum_sizes_how_msvc_ignores_standard
Yes you can. I don't think this is a good argument in favor of mmap but I've weirdly heard it a lot. Maybe putting structs in an mmaped file feels less naughty then write-ing it to an fd?
I assume so - I think the core problem with this article is that it sees mmap as being a shortcut to that, and doesn't understand the costs that come with it.
I mean "the core" is understating the number of issues in this approach :D
This isn't to say mapping files is inherently bad, but in my experience the actual cases where it's valuable are much rarer than the cases where it isn't, due to the myriad race conditions, the bad failure modes, etc