"Respectful" YAML patching in Rust
7 points by e_terekhin
7 points by e_terekhin
One of YAML’s ergonomic advantages is that a value can have an associated inline note explaining why it’s set the way it is.
This is a misunderstanding of the YAML spec, actually:
Comments are a presentation detail and must not have any effect on the serialization tree or representation graph. In particular, comments are not associated with a particular node. YAML spec v1.2.2 §3.2.3.3
The alignment of a comment in a YAML file with a particular node is purely accidental. That means any tooling that preserves comment placement is a violation of the YAML spec.
One of many reasons why it was an unwise decision to build the entire devops world on this.
This is a misreading I think. The comments are not associated with a node means that it is valid to drop them, or to move them, or keep them functionally associated with a node. It does not make it an error to keep them associated with a given node: that’s a subset of “they’re not associated with a node so you can move or remove them”.
It would only be an error to require (or make a tool that requires) that other tooling retain the association.
Perhaps "violation" is too strong a word, but the implication of this is that you can't ever rely on a comment's preservation if you're processing it with tooling.
Correct, you can’t rely on it. But there’s a difference between not relying on it, and a given tool choosing to provide those semantics. In principle the tool could itself - independently of the yaml spec - that it would maintain those comment associations, but you’d need those guarantees from each separate tool you ran if you wanted the comments to be guaranteed to survive unscathed.
Eg a tool can be nice enough to maintain the association, but it is not required to, and you would be wrong if you wrote anything that assumed such annotations would survive.
That means any tooling that preserves comment placement is a violation of the YAML spec.
I don't see why that's the case based on what you quote. If comments don't matter, then keeping them where they were surely wouldn't be a violation?
I've long wanted a library for Pog that would be able to patch JSON/YAML/TOML/XML/... config files while preserving existing formatting, and even tried to prototype a custom implementation.
The biggest missing piece for me was that mainstream parsing/serialization libraries only care about the raw data and never build an AST, so implementing a format-preserving library means building a full parser for each format from scratch, and then writing the AST manipulation tooling on top of it.
One exception is XML which has an abundance of AST-based parsers. It's interesting how many things the XML people got right, long before all the other modern formats even existed. :)
For JSON, I tried to implement a somewhat horrible hack where I run the .NET System.Text.Json parser in streaming mode, keep track of the current semantic position in the tree, and when it encounters the value token for a property that should be patched, its extent is replaced with the serialized token of the patch. This works surprisingly well, but there's a bunch of issues I didn't have the time/motivation to solve yet (e.g., detecting existing indentation in the file so patched multiline literals are correctly indented).
the libcst lib in Python does a lot of similar format-preserving work, the big thing being that if you want to preserve whitespace and presentation... you gotta store that stuff into your syntax tree. Kinda obvious but.