Lite^3, a JSON-Compatible Zero-Copy Serialization Format
17 points by andyc
17 points by andyc
Interesting, but apparently doesn't work if object keys have a hash collision, so not really production-ready.
https://lite3.io/design_and_limitations.html#hash-collisions
P.S. According to a comment by the author on HN, hash collisions are no longer an issue: https://news.ycombinator.com/item?id=46323803
I looked around in the spec and I don't understand why they don't support hash collisions. It seems they could have multiple entries in the btree with the same hash, given that they store key value pairs with the full key strings embedded within them.
I think it's maybe because of this choice, which I guess is saying they don't want to compare strings on lookups, only inserts(?):
As a result of storing hash digests instead of strings, tree traversal inside the critical path can be satisfied entirely using fixed 4-byte word comparisons, never actually requiring string comparisons except for detection of hash collisions. This design choice alone contributes to much of the runtime performance of Lite³.
OK yeah, that's good to point out:
Inserting a colliding key will not corrupt your data or have side effects. It will simply fail to insert
And then there is a table showing the collision probability with 100 entries is 0.00000115251102195, etc.
Of course, the possibility of collisions is not ideal. Therefore, this shortfall will be addressed in a future release of Lite³
The rationale for POSIX/The Single UNIX Specification indicates that the dbm functions are not required to support more than one key because the specification does not define what happens on a hash collision and does not define a hash function, meaning everything could hash to the same value and collide.
I'll port my HN comment from a few days ago:
It's a bit unfortunate that 1 doesn't seem to mention Fleece2, since Fleece is an already existing, solid format with very similar properties (JSON-compatible data format, uses offsets for sharing, allows for cheap updates, and uses binary search on arrays to achieve O(ln n) lookups in large tables or arrays, etc.)
Thanks! (Author of Fleece here.)
Fleece doesn’t use B-trees per se, although it is tree-structured. Arrays are arrays of “pointers” (relative offsets), and dictionaries/maps/objects are arrays of key-value pairs sorted by key.
The mutable forms of Fleece values are heap-allocated, but serializing back to a buffer is pretty cheap, especially if the values came from a buffer and only a few have been mutated.
CapnProto and FlexBuffers are other serialization formats with zero-overhead parsing.
If you're looking for details, the closest I could find was here, but it's really just hand-waving conceptual diagrams, not the actual format description: https://lite3.io/design_and_limitations.html#autotoc_md29