"Respectful" YAML patching in Rust

7 points by e_terekhin

colonelpanic

One of YAML’s ergonomic advantages is that a value can have an associated inline note explaining why it’s set the way it is.

This is a misunderstanding of the YAML spec, actually:

Comments are a presentation detail and must not have any effect on the serialization tree or representation graph. In particular, comments are not associated with a particular node. YAML spec v1.2.2 §3.2.3.3

The alignment of a comment in a YAML file with a particular node is purely accidental. That means any tooling that preserves comment placement is a violation of the YAML spec.

One of many reasons why it was an unwise decision to build the entire devops world on this.

olliej

This is a misreading I think. The comments are not associated with a node means that it is valid to drop them, or to move them, or keep them functionally associated with a node. It does not make it an error to keep them associated with a given node: that’s a subset of “they’re not associated with a node so you can move or remove them”.

It would only be an error to require (or make a tool that requires) that other tooling retain the association.
- colonelpanic
  
  Perhaps "violation" is too strong a word, but the implication of this is that you can't ever rely on a comment's preservation if you're processing it with tooling.
  - olliej
    
    Correct, you can’t rely on it. But there’s a difference between not relying on it, and a given tool choosing to provide those semantics. In principle the tool could itself - independently of the yaml spec - that it would maintain those comment associations, but you’d need those guarantees from each separate tool you ran if you wanted the comments to be guaranteed to survive unscathed.
    
    Eg a tool can be nice enough to maintain the association, but it is not required to, and you would be wrong if you wrote anything that assumed such annotations would survive.
- mplant
  
  That means any tooling that preserves comment placement is a violation of the YAML spec.
  
  I don't see why that's the case based on what you quote. If comments don't matter, then keeping them where they were surely wouldn't be a violation?
MatejKafka

I've long wanted a library for Pog that would be able to patch JSON/YAML/TOML/XML/... config files while preserving existing formatting, and even tried to prototype a custom implementation.

The biggest missing piece for me was that mainstream parsing/serialization libraries only care about the raw data and never build an AST, so implementing a format-preserving library means building a full parser for each format from scratch, and then writing the AST manipulation tooling on top of it.

One exception is XML which has an abundance of AST-based parsers. It's interesting how many things the XML people got right, long before all the other modern formats even existed. :)

For JSON, I tried to implement a somewhat horrible hack where I run the .NET System.Text.Json parser in streaming mode, keep track of the current semantic position in the tree, and when it encounters the value token for a property that should be patched, its extent is replaced with the serialized token of the patch. This works surprisingly well, but there's a bunch of issues I didn't have the time/motivation to solve yet (e.g., detecting existing indentation in the file so patched multiline literals are correctly indented).
rtpg

the libcst lib in Python does a lot of similar format-preserving work, the big thing being that if you want to preserve whitespace and presentation... you gotta store that stuff into your syntax tree. Kinda obvious but.