I built a 2x faster lexer, then discovered I/O was the real bottleneck

23 points by lonami


snej

Apple uses SQLite extensively for its applications, essentially simulating a filesystem within SQLite databases.

The first part is true, but not the second (AFAIK.) Apple's SQLite-based APIs — CoreData and the newer Swift one — are database APIs, with an ORM and queries; they don’t look anything like a file system. And Apple's filesystem APIs talk to the file system, not a database.

The HN thread this quote links to is about the Photos app, which uses a database to store metadata (and thumbnails?), but the photos are still stored as individual files. At least they were the last time I looked.

cblake

@snej's reply is very good, but some supplemental points:

hailey

Each syscall costs roughly 1-5 microseconds

People keep saying this. That's... really slow. Is it a macOS thing for syscalls to be that slow, or is the number just some received wisdom people repeat without measuring?

A simple C program can do 1 million read syscalls in 200ms on my desktop (Linux x86_64, AMD Ryzen 9 3900X). That's 200ns per syscall, or 5-25x faster than claimed.

MaskRay

This reminds me of a lld/MachO speed up patch https://github.com/llvm/llvm-project/pull/147134 ("[lld][MachO] Multi-threaded preload of input files into memory"). lld/MachO does not have parallel input file scanning, so the IO overhead is quite large. This approach helps.

There is even a further optimization: use a wrapper process. The wrapper launches a worker process that does all the work. When the worker signals completion (via stdout or a pipe), the wrapper terminates immediately without waiting for its detached child. Any script waiting on the wrapper can now proceed, while the OS asynchronously reaps the worker's resources in the background. I had not considered this approach before, but it seems worth trying.

The mold linker utilizes this behavior, though it provides the --no-fork flag to disable it. The wild linker follows suit. I think performing heavy-duty tasks in a child process makes it difficult for the linker's parent process to accurately track resource usage.

My feeling is that doing heavylifting work in the child process makes it difficult for the parent process of the linker to track the resource usage.

In contrast, lld takes a different, more hacky path:

Perhaps lld should drop the hacks in favor of a wrapper process as well. Aside, debugging the linker then would always require --no-fork.