Zero heap allocation HTTP server using OxCaml
51 points by eatonphil
51 points by eatonphil
OxCaml is getting hard to ignore. I can’t wait to see where we will land in 2 to 3 years time, and what will be the dynamic for OCaml vs. OxCaml.
What happens in OxCaml if I try to create a list or array of an unboxed record? Does if have a kind system and distinguish unboxed types from boxed ones, and only allow generic types to be instantiated with boxed types?
Yes, one part of a type's kind is its "layout". https://oxcaml.org/documentation/unboxed-types/01-intro/
Does the compiler monomorphise based on the layouts? Or do I have to manually create list, array etc. types for every layout I want to use them with?
It kinda monomorphizes, but you have to explicitly declare when you want this mode-polymorphism using a template language (like, C++ style preprocessor that generates code for each layout). Eventually the plan is to have actual mode-polymorphism that will automatically monomorphize, but that doesn't exist yet as far as I know.
Thanks. With monomorphisation there'll inevitably be more code to generate, which will create larger binaries and longer compile times. Is this an issue for you? Or do your programs run on servers or personal computers and code size is not an issue, and maybe with a fast language server (does OCaml have that yet?) compile times are also not be a big deal, so monomorphisation (at least based on layouts instead of concrete types) is not an issue for your use case?
Eh I mean it is all an issue, but we do some stuff to mitigate it. Compile times are long, but we use a shared compilation cache in dev that helps a ton with incremental rebuilds, and for doing optimized release builds we have gigantic remote build servers. But this has been true l long before OxCaml -- we have a lot of code generators for example for parsing network protocols that lead to giant binaries and long builds. I haven't noticed any change in compilation times since we added templates -- probably shared libs are slower for cold builds, but in expectancy those compilation units will all be cached. We also use a language server (OCaml has had one forever called Merlin, even before it spoke LSP).
I mean, I would love to have faster compile times (or especially faster linker times -- the shared build cache makes incremental recompilation pretty fast, such that linking is always the bottleneck). But OxCaml hasn't changed that for me personally.
The large binary size has long been an issue with using Core (which is why Base came along, and in the very old days there was even a Nano). But with OxCaml, I'm pretty happy with the roadmap of mode polymorphism; I generally think it's better to not generate useless code than to dead-code eliminate it afterwards, and that's the direction of travel with OxCaml.
I've been trying to implement a much simpler version of this in plain OCaml a couple of years ago, and it was quite tedious (ended up having a small state machine written by hand...), and it only avoided allocations on the "fastpath" (HTTP 200 reply). OxCaml didn't exist back then, and you can definitely see the massive improvement that first class support for zeroalloc brings. I don't see any of the awkwardness that I had in my own code: the parser looks very much like a regular parser, with a few extra annotations, and the compiler take care of the rest.
Isnt this just preallocating 32k regions and reusing them? Thats not exactly zero allocation, right? Either way an interesting project.
(author) the actual processing of the HTTP request through that 32k buffer is fully stack allocated with no OCaml heap activity, so the memory profile of the server is constant per connection. It'll get more interesting as I add in the support for HTTP business logic, as those will also be stack allocated.
I'm also getting debating when to get rid of the 32k region and replace it with a 4k buffer, but page sizes that small seem an anachronism these days. I'll figure it out when I merge the io_uring backend.
zero-allocation in OCaml/GC means no heap allocation that could be subject to GC scans in some region of code.
side note: it is my belief that pre-allocating all the memory you need at the start of the program is something all programs should strive to be able to do, although it's not always possible.
This seems pretty cool, although I don't understand enough about O/OxCaml to get how the stack-allocation works. Would this be similar to Zig's FixedBufferAllocator using a stack-allocated buffer, which parsing functions are "allocating" into?