SurrealDB is sacrificing data durability to make benchmarks look better

27 points by av


peterbourgon

when I ask the OS to write my [data] to the file, the OS will … [write that data to the] file cache [and will] tell me everything is done before it actually writes (ed: flushes) [that] data to the underlying storage.

For sure! And that’s because flushing thru to underlying storage is usually several orders of magnitude slower than just writing to the FS cache alone.

this behaviour … causes us issues if we want to ensure that when we make a change to a file, the data gets to the disk and won’t disappear if we lose power, or even if the next flush errors!

The issue is that fsync doesn’t actually guarantee these things. Calling fsync guarantees that the FS asked the underlying storage to write whatever data it provided, and that the storage responded “OK” – but that doesn’t mean much! Consumer HDDs often just put fsync’d data into volatile write-back caches, which would be lost after a crash. NFS under various configs will respond OK to an fsync before the write hits any kind of disk. Many (most?) cloud storage filesystems define fsync at a hypervisor boundary, well before any physical disks get involved. And so on.

Durability is a spectrum, fsync is stronger than not-fsync, but in no way is it a guarantee of durability.