The Unreasonable Effectiveness of ProseMirror Model in Rich Text Transformation

13 points by smoores

isuffix

I've been dealing with a quite similar problem in the Typst parser recently. In our concrete syntax tree, we assign unique numbers to each node called spans that are more stable than plain textual ranges across edits. This allows us to use spans as part of the input to our incremental engine.

But this means that there's currently no way for us to address text that isn't uniquely contained in a node of the tree. We want that capability to improve the fidelity of our error and warning diagnostics, so they can target text that exists within some node or continues across multiple nodes instead of just targeting the largest containing node.

My solution is sub-ranges: adding a range that's relative to a given span and targets text underneath it. These can use smaller indices than normal ranges, and importantly do not require updating when unrelated text is edited. However they still have to be translated to absolute ranges eventually, but so do our normal spans! So that cost is shared.

smoores

That makes sense! It's obviously a different use case, but I found ProseMirror's flat integer position system to be shockingly flexible. But it's only usable if you have the accompanying mapping system, which is what actually lets you track how positions change across transformations.

Your sub-range solution seems a little more akin to Slate.js's Point system (which itself is actually pretty similar to ProseMirror's ResolvedPos). Both Slate and ProseMirror are specifically avoiding assigning unique ids to nodes (for a number of reasons that are specific to text editing, I think), but Slate's system has one component for identifying a node, and another for identifying an offset into that node's text.

dlants

I've spent a lot of time with ProseMirror, and it (and CodeMirror) are brilliant! Marijn did a great job with both projects, and I have advocated in all of the places I've worked that ended up using it to send contributions his way. https://marijnhaverbeke.nl/fund/

Both projects have some really nice design decisions, including the document format + unique positions you highlight here. Another is the flat, rather than nested, mark setup (a <b>text<i>more</i>text</b> becomes something like [{text, b}, {more, b, i}, {text, b}]). And of course the orientation around real-time, collaborative editing.

smoores

Yeah, obviously it didn't make sense to get into it here, but I really think that Marijn just made all the right trade-offs when it comes to collab. Personally I'm a big fan of prosemirror-collab-commit, a fairly minimal extension of prosemirror-collab that supports server-side rebasing. It increases throughput quite a lot, and mitigates the starvation effect that can happen to clients on high latency connections with plain prosemirror-collab.

That's what we're using for the collab module in Pitter Patter, which is the open source project that I work on for my day job.