OxCaml, Jane Street's extension of OCaml is now open-source
133 points by avsm
133 points by avsm
I especially find the modes really interesting and believe that they have very high potential. Default to aliased GC’ed variables but allowing for unique and stack allocated variables is very nice for performance reasons but also very ergonomic. Also linearity support is great to safely represent resources.
OCaml is on the way of becoming a Rust but with better ergonomics for general application development.
I wrote a few things in OCaml recently-ish after a long time without doing much of it. What stroke me the most was how productive I could be by not caring about most things due to the typing and GC but also due to being able to assess performance later on and come back where changes were needed. I have far less experience with Rust but the need to care about so many more things makes it far less fun for me.
(maybe the feeling was the same as vibecoding, except there was no AI involved anywhere :P )
OCaml becoming a rust is a bit of an odd phrasing. After all rust is already an OCaml?
Rust got algebraic data types and higher order types from OCaml. Now OCaml takes inspiration from Rust’s borrow checking. They getting closer but they are still very distinct. If you wrote in both you know that writing in both are very different.
As someone who writes in both in seems to me that rust is an OCaml dialect. Certainly they’re not exactly the same language of course or there would be no point.
It feels like Rust has a lot more going on implicitly in calls etc though right? Like trait resolution and return type polymorphism stuff alone. But it’s been a long time since I did anything with OCaml.
The control over allocation and fearless concurrency features are really interesting.
I really think OCaml is one of the more underrated languages out there. I quite like that it gives you low-level control while still being really nice as a high-level language. Polymorphic variants are some of the best ways to handle errors that I’ve seen in a language.
yeah, ocaml occupies a very sweet spot in language design, with high level features, ergonomics and expressiveness, but also high performance compile-to-native execution. only other semi-mainstream languages I can think of in that space are D and maybe crystal. haskell and rust don’t quite have the ergonomics, zig is too low-level and go lacks the expressive power. (not sure about nim, I was perhaps unfairly turned off by its approach to capitalisation and never looked at it,l.)
Yeah, the problem is just that nobody wants to write ML…
This might be it, but I am not convinced, as we don’t have good implementations of ML available. Things like builds system, code formatters, standard library, cross compilation, etc, are more important than the language itself, and they’ve been lacking in the ML land. As an example, opam gained native windows support only last year. F# is nice, but the CLR toolchain is weird. Its hands down the most advanced programming environment out there, but it just doesn’t seem to take (perhaps because .NET core transition had to happen, instead of that being the starting point)
I am moderately sure that if someone builds a modern ML, delivering Go/Zig/Rust levels of toolchain polish, it’ll find a lot of success in the cracks between Rust, Java, and Go.
Agree. I am working on a toy language. When having to picked between ocaml, my love, and rust… I went rust. Rust has the tooling and library toolkit for my needs.
Ocaml had a painful build system. A painful package management. Painful test libraries. Harder to use ergonomics tooling. And simply not the libraries i need as building blocks.
But damn was I wanting to do it in ocaml. It pained me to accept i needed to use Rust.
Hell ocaml gc would probably have been nicer than the borrow checker for implementation. Same with the type system and all the languages niceties.
But the tooling and ecosystem matters so much more. It is not even a fair fight.
The CLR is quite a bit better cross-platform than I would have expected from a Microsoft product. Not that it doesn’t have its own weirdness.
F# suffers greatly from being a second-class citizen in the whole ecosystem. The tooling and Visual Studio integration is not great, navigating between C# and F# definitions is still basically broken, it ends up being reasonable to use C# libraries - which means dealing with the discrepancies between the types/mental model/conventions at each call site. Or you skip some of the C# interop and accept worse performance and… it’s too bad.
It seems to me that if you’re on the JVM, write Java; and if you’re on the CLR, write C#. Any other good features of another language targeting that runtime are going to be integrated into the main language and that smaller team just won’t be able to out develop the main language, tooling, ecosystem, etc. Unless you do something fundamentally different - like Clojure.
All that being said, yes, if someone builds a modern ML with a polished toolchain, it’ll get adoption.
F# is nice, but the CLR toolchain is weird. Its hands down the most advanced programming environment out there,
Could you explain a bit more? You piqued my interest.
This deserves a blog post. I don’t know a good one though. https://blog.celes42.com/the_language_that_never_was.html has somewhat unfortunate signal to noise ratio, compensated by length. But:
Polymorphic variants are some of the best ways to handle errors that I’ve seen in a language.
Have you actually tried using poly. variants for error handling? Unless they improved poly. variants since 5.0 I don’t think you can use them for error handling.
I had a thread about this on social media a while ago where I experimented with various error handling patterns using poly. variants in OCaml, and they lacked a few things that made them useless for the task.
Unfortunately I stopped using X-style social media and deleted all my posts.. but you can see my “error handling expressiveness benchmark” here.
If you try implementing those examples in OCaml using poly. variants you’ll realize that you won’t be able to do a few of them.
IIRC the key feature OCaml’s poly. variants lacked was that the binder types are not refined based on handled alternatives. So if I have x : [Foo, Bar, Baz]
(not the exact syntax, I don’t remember the OCaml syntax right now) and do something like
match x with
| Foo -> ...
| other -> ...
other
‘s type doesn’t become [Bar, Baz]
, it’s still [Foo, Bar, Baz]
.
That means handling some of the errors while propagating others become impossible to do generically. You have to handle all the alternatives, and return new variants, manually. Or maybe employ macros.
I solved this problem in my own language, which also has polymorphic variants. (see also follow-up posts to the linked post)
There are other issues as well, such as lack of typing for exceptions that need to be handled (rather than propagated and logged), lack of a convenient way of converting exceptions to result values and result values to exceptions, … which I’m also trying to solve in my language.
I think poly. variants now refines match arms (look for ‘narrowing’ in https://dev.realworldocaml.org/variants.html#polymorphic-variants). There is a great article to demonstrate how it would work in reality https://keleshev.com/composable-error-handling-in-ocaml
Don’t you have to explicitly list all of the alternatives that your new binder will hold? E.g. this line in your link:
let extended_color_to_int = function
| `RGBA (r,g,b,a) -> 256 + a + b * 6 + g * 36 + r * 216
| (`Basic _ | `RGB _ | `Gray _) as color -> color_to_int color;; <--------- HERE
A real-world use case for the Foo
handling code in my example is when you have some code that throws Foo
, and maybe others, and you only handle Foo
. So you can’t explicitly list all of the things you don’t handle, because you code is polymorphic over errors being thrown (e.g. maybe you take a callback argument that throws Foo
and others).
My blog posts have examples of this kind of thing and more.
Yes narrowing is possible but you need to explicitly specify the narrowed type. In practice I didn’t find it to be a problem because usually you have an alias for the narrowed set anyway. What’s left is usually a set of related errors in my experience. Though I used it in my hobby projects so I don’t know how good is it in a large scale project.
I have, and it wasn’t really an issue. But I just kept it simple: if something failed I returned err `tag
and bubble up the error to the appropriate place and I didn’t run into any issues. Any new unhandled error would be marked as such and I could handle it.
I tried to learn OCaml quite recently and generally liked it, but what made me gave up was the atrocious amount of ceremony needed to compile programs and use external packages, together with the small standard libary.
What is the nature of the ceremony?
When compiling a program, you need to manually list all the source files, like in C. And as far as I could tell, to use external packages, you have to create an entire project with dune
, which involves several configuration files.
With Dune you don’t need to list all source files.
You can also use dune init proj myproj
to get the initial scaffolding generated, see https://dune.readthedocs.io/en/stable/quick-start.html#initializing-projects, then put your .ml files under bin/, modify main.ml and you’re ready to go.
Compiling “raw” feels very 90s, I agree, but a minimal dune project requires only two files (one for the project, one per directory to describe libraries and executables). The main library mechanism before dune came along was ocamlfind (aka findlib) which can still be useful (e.g. for plugins). Either way opam is for package management (which I think dune is looking to wrap with dune pkg
(?); I’m not entirely a fan of how much leverage jane street has over the ecosystem, but there you go.)
Oh, and there are several drop-in stdlib replacements you can use: batteries, containers, jane street’s core. Use this package search - some search engine results might give you older, less nice-looking versions.
The other commenter already mentioned dune but just fyi, dune is part of the OCaml Platform and is the recommended build tool for OCaml. For most projects you should just use dune and would never need to ‘manually list all the source files, like in C’.
Yeah the small std is definitely a weak point. The ceremony with dependencies personally, but I used Dune, maybe that makes a difference?
Man, I wish I was smart enough to work at a place like Jane Street. This is awesome work. I was planning to learn Roc but I think this weekend I will be playing with OCaml instead
I remember thinking that I wanted to work at Jane Street when I was in school, but looking at it now, working at a place whose only purpose is money, without even the pretence of providing goods or services seems like it would wind up feeling pretty nihilistic and depressing.
Liquidity providing and routing orders from brokerages are examples of real services (ones that you likely benefit from, too) that those sort of firms provide.
IMO there definitely is a level of “what does my work even do?” involved with working with money. I used to work on a fraud team and if I did my job right, a % on a dashboard would go down. 🤷
Modes are really interesting. Beyond providing type safety for concurrent accesses, I think they can also be used to guard against resource leaks (e.g. file descriptors).
Sometimes code is not as simple as with_resource
and close the resource at the end (e.g. you may have a pool or cache of resources, and ensuring no leaks in such code is quite tricky). Previously I had to use a runtime solution for that.
Do they have any benchmarks about the performance of some of these features? I mean, if OCaml could be within a reasonable proximity to Rust, I’d much prefer OCaml.
With modes there’s a possible path to relaxing Rust-style Aliased xor Mutable rules without giving up memory safety. I don’t know that this is a huge priority for Jane Street. Ante (https://antelang.org/) is focused on this and doing something similar.
The gist is that in a single threaded context, some kinds of shared mutation are fine:
swap(&v[i], &v[j]) is safe. It doesn’t matter that both pointers are derived from the same allocation, or even if they are identical (i == j). Neither pointer will dangle.
append(&v, &v[i]) isn’t safe*. If len(v) == cap(v), the vector’s underlying buffer will be reallocated and the pointer to v[i] will dangle.
With modes, you can say:
func append(v: unique &vec int, n: &int)
I.e.: you can only append to a uniquely referenced vector, and because v is unique, n can’t possibly alias it.
For the above to work, the language has to enforce an invariant: you can’t assign to (i.e. change the address of) a reference unless that reference is unique.**
Depending on your language, you may have to add more rules: if you have sum types, you need to make sure that the value of a sum type can’t be reassigned if someone is holding a reference to an associated value inside a variant.
Basically, you have to have rules around any memory that can “change shape” in some way. It gets more complicated if you want data race freedom too, but I think that’s still doable. OxCaml has already put uniqueness and data race safety on separate axes.
I think this is pretty cool. I write Rust at work. It’s the right tool for what we’re doing (software that runs in space), but it doesn’t spark joy for me in the way that it does for other people. I want something less uptight and more fun. I’m not sure if modes will be more fun, but I have my fingers crossed.
*assume the compiler can reorder memory accesses in a way where the implementation of append can’t guarantee when the second argument will get dereferenced.
**you can also pick a different invariant: a reference created by dereferencing another pointer must be unique. I.e. you can’t do &v[j] while &v[i] is still alive. Then you put unique on n rather than v, and you can safely append to an aliased vector but not swap two of its elements. IIRC this is Ante’s approach.
To implement ‘swap’ safely I think you need to use Atomic.exchange
.
Otherwise another domain could write a value to v[i] in-between you reading v[i] and replacing with v[j], and that value would be overwritten/lost by swap, i.e. a data race.
The data race wouldn’t cause the program to crash (no dangling pointer), but it is desirable to avoid them, which is what modes are useful for.
Perhaps unboxed atomics would be useful (i.e. atomicity could be another mode). Currently to use atomic operations you have to wrap them in Atomic.t, which adds another pointer indirection. But would be nice to be able to express that a field is atomic, the same way that we can mark a field as mutable. That’d then avoid boxing, although there’d be alignment constraints (which should usually be met unless you also use the data layout annotations)
Yeah, on its own a uniqueness mode only gives you memory safety (specifically temporal safety), not data race safety. That’s why I mentioned “single threaded” above. There’s no way to get around the need for synchronization to prevent data races.
But in OxCaml, you can safely swap uncontended values, and guarantee that a contended value will never be passed to swap (might be another mode; I haven’t looked in a while). You only need AxM if you know the data is shared.
I’m also a bit skeptical of choosing to get rid of data races at the language level. It adds a big complexity burden. In Go, the happy path (goroutines and channels), which I love, pushes you to write data race free code. That’s good enough for me, but lots of smart, reasonable people disagree.
Unboxed types, stack allocation, parallelism, SIMD, monomorphised generics for mode polymorphism (templates).. It looks like they need Rust but they’re stuck with OCaml.
There is a significant upside that you don’t have to pay for any of those when you don’t want. Defaulting to GC is very convenient.
Fearless concurrency is nice but how good is Ocaml GC at dealing with multiple threads nowadays? Can I utilize all 128 cores on my server box?
OCaml 5 has a multicore-enabled GC: https://ocaml.org/releases/5.0.0
Yeah but having it is very different from linearly scaling to 128 cores
You don’t need a special multicore-enabled runtime to linearly scale to 128 cores, just start up a process per core and you automatically get perfect linear scaling. But you can’t do that if you want to use shared-memory parallelism, because the costs of IPC would be prohibitive, so you use a multicore runtime in your application, but even an efficient multicore runtime imposes some costs for communicating between the threads, so you can never scale linearly with shared memory parallelism. See https://discuss.ocaml.org/t/ocaml-5-performance/15014/9?u=yawaramin for details.
But OCaml’s runtime can scale pretty well to 128 cores; there are benchmarks that show this.
Yeah nobody means multiprocess, there are limited circumstances where it makes sense but it’s mostly a coping mechanism for langs that have global locks or trouble with parallel GC. Lots of parallel algorithms do in fact show linear scaling as you increase cores, this has been a popular thing to demonstrate in perf blog posts for years.
Well this sounds cool. Anyone use ocaml for scripting? I might try directing llms to use specific languages for auxiliary tasks as a way of learning more about them.