Bab: a family of hashing functions for p2p networks

26 points by dustyweb


bitshift

The spec is a little bit more clear about why a new hash function was needed:

The key feature of Bab is streaming verification of a string: as a program reads a string, it can verify at regular intervals that the prefix read so far is indeed part of a string hashing to some target digest.

Which of course is important if you have P2P programs streaming large blobs of data to each other.

jeremiahlee

Aside: the pixel font on the website is unnecessarily painful and difficult to read, especially the small italicized text.

jmillikin

Seems similar to IPFS or BitTorrent DHT -- send a tree of hashes, then the content, and the client can incrementally verify the content against the hash tree nodes.

As a minor nitpick of the terminology in their spec, this is not "content-addressed storage" as it's usually understood because the same content can have different checksums if the chunk size, hash algorithm, etc is parameterized. "The chunk_size value must be a natural number greater than zero" means a 4 GiB file has > ~4 billion possible checksum variants even if everything else is hardcoded.

If you can't compute a single storage address of some content using only the content itself then it's not "content-addressed" (it might still be useful, it's just called something else).


I'm more used to seeing this as a two-layer send where the tree is itself hashed and sent all at once, so you end up with a protocol like:

GET sha256:00002000:123...DEF
<obtain the tree that serializes to 8192 bytes with SHA256 checksum 123...DEF>
GET sha256:00100000:<hash of 1 MiB chunk 1...>
GET sha256:00100000:<hash of 1 MiB chunk 2...>
...

The advantage being that a structured encoded tree is a bit more flexible than just sending a list of checksums in some order, for example you can have chunks be different lengths and then apply content-defined chunking. And maybe some simple built-in decompression if the content has long runs of predictable bytes (i.e. object code).