High-Performance DBMSs with io_uring: When and How to use it
13 points by rrampage
13 points by rrampage
There is also a nice summary / quick reference post by one of the authors here: https://toziegler.github.io/2025-12-08-io-uring/
The advice here generally makes sense, but I’m not sure if this tracks:
You might want to consider io_uring if: […] Your workload is I/O-heavy (network or SSD) and latency-sensitive.
At the syscall-level, io_uring offers greater throughput at the expense of latency, since it’s batching (and therefore delaying) what would otherwise be many sycalls. How this impacts end-to-end latency should be highly contingent on how those syscalls are triggered. Does a task trigger many syscalls that are not dependent on each other? If so, that would improve end-to-end latency. Otherwise I’d expect it to regress. But I’ve never used io_uring, since it’s banned at work for security reasons - is there something I’m missing?
On that note, I wish this covered security implications as well (maybe the paper does.)
Author here. You are spot on.
How this impacts end-to-end latency should be highly contingent on how those syscalls are triggered.
Yes, this is correct. It strongly depends on the system architecture. What we have in mind in the paper is a setting where the application is CPU-bound and most cycles are spent on I/O (which is what we mean by I/O-heavy). In such a scenario, this might improve end-to-end latency. In these cases, batching also occurs more naturally as work "queues up."
If the CPU is not the bottleneck, batching is often unnecessary and may even hurt latency, as you rightly pointed out. However, this is fairly general guidance and may not hold universally; ultimately, it is important to measure and evaluate this trade-off for the specific system under test.
Re: Security is only handled superficially in the paper, as it is a very broad topic, but you might be interested in "RingGuard: Guard io_uring with eBPF", which goes into this aspect in much more depth.
Although I haven't used io_uring in anger, I have used a very similar architecture with submission queues. What's interesting is not so much batching per-se, but rather coalescing. Basically, you accumulate new requests until the previous batch of requests have finished, then submit whatever has accumulated since last time.
What's interesting about this architecture is that it does not trade any latency, because you still submit immediately if nothing is pending, you only start accumulating if there is already something pending. Basically you let your device control how quickly you send it I/O requests.
What's nice about this approach is that it gets faster under load, whereas most systems get slower under load: assuming batching helps with throughput, batch sizes automatically get bigger under load, and automatically decrease again when the load subsides. Particularly if your coalescing can be more than just submitting multiple independent requests in a single syscall.
There's obviously a lot of detail that I am glossing over right now, but that's the gist of it.