Upgrading Semgrep from OCaml 4 to OCaml 5
34 points by pabloest
34 points by pabloest
I understand how it happened, but it’s unsettling to see all of this work put in just for the outcome to be “performance is only a little bit worse now”. The details about why that happened were very interesting!
I don’t think it’s a fair assessment. OCaml got much closer to making the new parallel GC runtime as fast as the original single-threaded one than anyone else.
Haskell still requires linking with the threaded runtime and defaults to the single-threaded one.
CPython added --disable-gil
build option past October, as per PEP-0703. The multi-threaded GC is not generational, and it’s not certain that it’s even safe yet, which is why it’s a compile-time option.
Semgrep hit an edge case and managed to get the memory performance back just by doing GC tweaks (which have been the staple of GC-ed language programming everywhere from the start, if anything). The post doesn’t touch on the improvements from built-in parallelism support or anything else, too.
If anything, performance is slightly better now, not slightly worse. It was within normal noise, though, so I wasn’t comfortable making any claims to that effect in my blog post. It’s unfortunate that it was tough to get to that point, but as dmbaturin notes it’s remarkable that the OCaml maintainers were able to architect a multi-threaded runtime where equivalent performance is even possible.
it’s remarkable that the OCaml maintainers were able to […]
it’s all trade offs, and I’m sure they made well justified technical decisions that would go over my head as someone who doesn’t spend all day thinking about it. I wasn’t trying to paint them negatively, just meant to call out that it’s interesting how a switch in focus for them resulted in more work for you.
I don’t know if you have actively followed multicore runtime development in GCed languages, so maybe I can give some context. People figured out how to make garbage collection run in parallel with the program’s code long time ago. Multi-core runtime forks for CPython existed over a decade ago, for example — the problem was that they were always slower for single-threaded workloads than the original version that would blow up if more than one thread was allowed to execute Python code at once.
Different projects took different stances on the issue. Python maintainers refused to merge anything that would harm single-threaded program performance. JVM or Haskell (read: GHC) provided different GCs for different situations.
A multicore GC runtime where you get the old performance for single-threaded code in most cases without doing anything is a pretty big breakthrough by itself.
Also, the typical attitude towards articles in this genre (“we ran into a memory problem and solved it with this one weird trick GC tweaks” in mainstream languages is positive, and no one says it “created more work”. When I see OCaml getting singled out for allegedly creating more work for developers, when the reality is that all GCed runtimes have always placed the responsibility for making performance compromises on the user, I’m inclined to get defensive about that. ;)
Fair! I definitely would not have minded if this were easier. For what it’s worth, some of the OCaml maintainers read my blog post and were similarly concerned. There is some ongoing work to improve GC pacing that could be relevant, and they are going to look into this example as part of that.
I found the paper on the garbage collector changes (linked from my blog post) to be fairly accessible, although I certainly had to read it meticulously to absorb it. It’s a fascinating paper and explains quite a lot from scratch. I recommend it if you are interested in learning more.