mmap in Go considered harmful
13 points by runxiyu
13 points by runxiyu
Note that this blog post is from 2018, which is before changes to Go's preemption strategy. Check out this 2020 GopherCon talk about how Go handles preemption these days: https://www.youtube.com/watch?v=1I1WmeSjRSw
Oh, apparently go may require prefaulting mmap is similar
vmsplice() on Linux could be interesting here, both in a good and bad way. If you have a pipe another process is going to read from, you can essentially move the page faults to the reader. That could also be used to move the page faults to a read() by writing/vmsplicing one byte from each page (IOV_MAX is 1024 so this should be only two syscalls per 4MB of pages). Once that read completes you know the pages are faulted in and as you've done a read() Go will schedule this correctly.
Obviously Linux specific but would be interesting to know how that idea performs compared to Tedu's idea of write() to /dev/null or using posix_madvise (although I don't know how you know posix_madvise has completed paging in so it's a bit different).
So crazily enough, I've been thinking about this exact blog post for over a decade. Ever since I thought "hey, a page fault on swap would probably really mess up the go scheduler because it's really going to block the underlying thread (M in go terminology iirc properly)".
Thanks to Valyala for writing this and sharing!
That's a different problem, with page evictions and faults needing to update the TLBs for all CPUs. There's a paper that goes around every now and then about it called "Are You Sure You Want to Use MMAP in Your Database Management System?" at https://db.cs.cmu.edu/mmap-cidr2022/
I think the go scheduler blocking is a more severe problem?
I think Varnish got designed that way because it was written back when:
I think Varnish was written for FreeBSD first and I vaguely remember seeing grumblings by PHK (Varnish's author) to the effect that FreeBSD's virtual memory system was better at the time at doing this fast and scalably, but who knows what's changed since then. Also I might have mis-remembered.
Howard Chu, author of LMDB, wrote a response to that paper; he was righteously vexed that LMDB is a counterexample to many of the paper’s assertions… https://lobste.rs/s/n40bdi/are_you_sure_you_want_use_mmap_your_dbms
(edited to add) The RavenDB commentary is also worth a read https://lobste.rs/s/ltrw2p/re_are_you_sure_you_want_use_mmap_your
Oh, I see, this is a bad title, it is about MMIO. From there it sounds like Go uses virtual threads/fibers/whatever-they're-call-this-decade - so because the IO page faults can happen at any point your working "thread" gets blocked but more work cannot then be scheduled because there's only one real thread. This is a standard problem with virtual threads not Go specific (see also virtual threads breaking forward progress guarantees that depend on tasks being guaranteed to get cpu time).
There is an open proposal (#68769) to expose Prefetch to user space. It's not exactly a solution, but it's .. something?
That’s an intrinsic corresponding to a prefetch hint instruction, which just moves data closer to the CPU in the cache hierarchy. It doesn’t help with missing pages in mmap() because a prefetch hint instruction doesn’t trigger a page fault exception, so it doesn’t tell the kernel that the program wants to use the missing data.
The kind of prefetching needed here is IO or virtual memory page prefetching (not CPU cache prefetching). To trigger an IO prefetch the program needs to trigger real page faults or make corresponding read() syscalls. If it wants to avoid the latency hit it needs to do that on a separate worker thread; golang will do that automatically for read() but probably not for page faults. See also https://lobste.rs/c/diojhk