A new chapter for the Nix language, courtesy of WebAssembly

51 points by diktomat

wink

Add a YAML parser to Nix as a builtin function. This has to be written in C++, but it does allow you to reuse any existing YAML parser library for C++. [...] So updating the YAML parser dependency could cause differences in evaluation results across Nix versions

Write a Nix plugin. [..] This means that Nix flakes using it are no longer self-contained

Wasm is a low-level binary instruction format that can be compiled from many high-level languages, including Rust, C++, and Zig.

Please explain to me how "never updating your YAML parser dependency (in source form)" is meaningfully different than "never updating your WASM binary blob compiled three years ago"

Regarding the "dependency" part, of course it's easier to handle, but that's not the point here.

Or maybe the post is just mixing up their (orthogonal) problems and not making clear how they are deciding which poison to pick.

I'm not arguing against the feature, seems perfectly sensible - but the written reasoning seems wrong or brushing over the problem. (Also it's assuming the Rust wasm crate is perfect and never changes anything meaningful, not even with bugfixes)

virchau13

The difference is that WASM is external and can be updated according to the user's needs, whereas builtin Nix functions have to provide a commitment to be stable to all users. For instance, if Nixpkgs standardized around a certain WASM blob for parsing YAML, then the Nixpkgs maintainers can easily update by recompiling (and possibly tracking differences/regressions between the old and new blobs). On the other hand, if YAML support was built directly into Nix, it would be nigh infeasible to upgrade the YAML version, because users would come to depend on the specific behaviour provided by the Nix interpreter.
- wink
  
  Maybe you mean the same thing as polywolf posted in the other comment, which I interpret it to be a certain innate thing about nix handles different "subsystems", which would explain it better I guess.
  
  I still highly dispute your use of "this one is easy" vs "that one is hard". They're two dependencies with a certain deterministic input to output, until you update one of them and have to kinda recheck all your assumption. If a builtin is not being able to be versioned then I see that as a different problem, and then it's (as I said) probably a good solution - but that does not make the explanation good.
sourcemap

Please explain to me how "never updating your YAML parser dependency (in source form)" is meaningfully different than "never updating your WASM binary blob compiled three years ago"

If you committed to never updating your C++ implementation, it would be functionally identical to never updating your wasm blob.

The key constraint is Nix is absolutely obsessed with reproducibility. If anything in builtins has an observable difference between nix versions, you have impurity. The bar to getting things into builtins is extremely high.

Today's Nix solves this by keeping their standard library lib in nixpkgs. Pure nix, versioned, and you can load multiple versions side-by-side. It's slow as hell and extremely hard for some classes of problems.

Loading a wasm blob means you can upgrade it independently of nix, you can pin and upgrade it in nixpkgs, and you have reproducibility guarantees that are hard (impossible?) to get in C++.
- wink
  
  That makes some sense, but not having looked at the C++ source code it still feels a bit weird for a project that is itself written in C++. But maybe the deps and env of nix (itself, not everything used) are pinned down well enough. But then again if it already works, is just the expansion usually so hard?
  
  I still don't think it's well explained at all, or at least shortened to a certain unclear point.
  
  And yeah, I guess I am saying that you can't in good faith argue "if we never update it, it will be fine" as a pro for one thing but as a con for the other, except if (like you did, somewhat) explicitly spelling out the details.
polywolf

I think the difference is, the YAML parser is bundled with the nix binary, whereas the WASM blob is versioned alongside the rest of the <nixpkgs> inputs. In the latter case, it is OK for a changed input to create a different output, while in the former case it's assumed that newer nix versions are backwards-compatible with older ones.

The hope is then that updating the WASM interpreter inside the nix binary won't actually change the result of executing the WASM blobs in <nixpkgs>, though similar to upgrading a YAML interpreter that might break down in the general case.
matklad

Please explain to me how "never updating your YAML parser dependency (in source form)" is meaningfully different than "never updating your WASM binary blob compiled three years ago"

Hm, this question feels very surprising to me, as the original explanation is pretty clear, to me, so I expect something big might be missing?

I would say the difference is in "you" --- which party gets to manage the dependency?

First path is to make YAML parsing built into nix, so that nix maintainers are responsible for that. That obviously isn't great, because changing that now breaks the whole universe, and, what's worse, everyone now clamors for including their favorite language parser upstream.

The second path is to extract the thing into some kind of dynamically loaded .so library which is added to nix at runtime. In this case, the user, who uses nix to build someone else's project, is responsible for installing and managing this .so, and this is also not great, because now everyone will have their unique "works on my machine" .so, breaking reproducibility.

The third path is to make the authors of the project that need YAML parsing manage YAML parsing dependency. And this works great in Nix, because you can just bootstrap the whole shebang, you can add YAML parser source code to your nix project, compile it with nix-provided cc, and feed the results back into nix.

The problem with that is mostly just performance --- your project ends up depend ending not only on the logic to parse YAML, but also on the logic to parse C++ code for parsing YAML. In the memorable words of Joe Armstrong, you asked for a banana, but all you got was a banana, a gorilla holding the banana and the half of the jungle. But, other than that, it is perfect -- deterministic-ish, reproducible, and managed by the party that chose to use YAML for their project, not by upstream nix maintainers, and not by downstream consumers of the project.

The next evolution is to pre-compile YAML parser, and add a dependency on the resulting binary, forgetting everything that was required to cook the binary. Which nix also makes easy, except for the pesky problem of cross-platformness --- you need one version for x86, one version for aarch64, one version for the ISA which is being designed in some bedroom right now and will become all the rage in 10 years...

So you want to compile your stuff into some cross-platform IR which you can interpret in the same way on all platforms, current and future, which is exactly what WASM does.

(there's extra benefit of determinism: with C++ plugin/IFD, the code could theoretically print host's pointer size or something, making the build non-deterministic. With WASM, you generally can expect determinism even from adversarial code, but this is probably the lesser benefit than giving the right party ability to control version of WASM dependencies, without saddling everyone with the requirement to build rustc just to parse some YAML)

This isn't spelled to such a great detail in the post, but I think should be pretty clear from context?
- MatejKafka
  
  The problem with that is mostly just performance --- your project ends up depend ending not only on the logic to parse YAML, but also on the logic to parse C++ code for parsing YAML.
  
  Shouldn't this be solved in practice by the nixpkgs binary cache?
- wink
  
  Thanks for the post, I think this helped me rephrase it in a better way. I agree with everything you said from a certain standpoint, maybe an analogy would be quantity versus quality.
  
  You still have the same problem (which was my position), but you're having fewer and/or less severe drawbacks. Maybe I was nitpicking too much, that you still have a (maybe nice, secluded, cross-platform, fast) dependency that can change versus 2-3 (bad, entangled, single platform, slow) dependency. My point was strictly THAT you need to update it, but one way is probably easier, faster, safer.
ghthor

I assume this mixup really comes from a desire to not do any c++ dep management
- wink
  
  Which would be a perfectly valid reason, if stated clearly.
max-headroom

Finally, you could use import-from-derivation to declaratively build the Wasm module from source. But then you’re back to using import-from-derivation, which somewhat defeats the purpose!

So this whole thing is pointless unless you're okay with shipping opaque binaries.
- nrdxp
  
  security considerations in the article: trust me bro
nemith

You know he more that Determinate is moving away from core Nix they should just fork it. Call it Dix.
- steinuil
  
  I suspect part of the point of shipping this is that they want to be able to extend Nix for their customers without forking it too much. And this is a feature that is 100% intended for their customers; committing WASM blobs to git is probably acceptable within the walls of a company because you can resonably expect it not to be compromised, but it isn't something I'd ever want to see in a public project.
vaibhavsagar

This is extremely cool! I'm glad there is an open PR for adding support to upstream so I might eventually be able to use it without switching to Determinate Nix.
noon

this looks absolutely horrendous.
wwfn
The post, lamenting nix is not a general purpose programming language, uses Fibonacci to profile performance and show WASM is a lot faster for numeric computation.

79.33 [nix] seconds to 0.33 [WASM] seconds ... but Nix uses much less memory using the Wasm version: 30 MB instead of 4.5 GB, a 151x reduction.

I was curious how guix fares leaning on Guile. Much less memory usage than either WASM or nix, but no where near as fast as rust to WASM example. It would be "fun" to benchmark guile to WASM (via hoot?) in nix.
```
 /usr/bin/time -v guile -c '(define (fib n) (if (< n 2) n (+ (fib (- n 1)) (fib (- n 2))))) (display (fib 40))'
        User time (seconds): 74.92
        System time (seconds): 1.02
        Percent of CPU this job got: 130%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:58.15
        Maximum resident set size (kbytes): 12164
```
Cloudef

Honestly I feel like IFD should be the solution. If triggering a IFD causes lots of downloads / building. Then that means the tool itself should be simpler and not depend on such huge bootstrap chain, or at least the tool should be in the binary cache. I know IFDs in nix are bit frowned upon, but perhaps nix itself should be changed to support IFDs better so that they don't stall everything.

I don't understand how wasm here will be much different. If flake for example depends on some wasm plugin it still needs to bootstrap it by getting the compiler and all deps the project may need, unless the idea is to ship binary blobs somewhere (what happened to the cache?).