The perils of UUID primary keys in SQLite

14 points by andersmurphy

lifepillar

Nice. It would be interesting to see the numbers for rowid tables where the integer key is random rather than monotonically increasing.

One thing the article doesn't mention, but I think it's important, is that the primary index on rowid tables is a B+-tree, while for “without rowid” tables it is a B-tree. So, generally speaking, the latter is not ideal when the average record's size exceeds a certain threshold because the internal nodes of the index store full records (iirc, SQLite's manual suggests 1/20th of the page size as a rule of thumb).

andersmurphy

Thanks. I'll run those tests later today and report back/update the article.

silentbicycle

Version 7 UUIDs have a 48-bit timestamp at the head, so they aren't randomly distributed like this. That will also help with the excessive paging and rebalancing.

andersmurphy

Yes UUID7 is what the article explores.

Student

Are people really using uuids as their primary keys? What’s the benefit rather than having the uuid as an auxiliary key for when you want a uuid?

markerz
It's for scale. If you want to generate a unique ID in a distributed manner (many computers, many datacenters around the world) at a high rate (e.g., S3 uploads) you don't want to lock on a single incrementing integer. Locks are slow to sync globally.

GUID and UUIDs solve this by their construction.
- v1 and v6 encode (machine ID + timestamp), so it's kind of an auto-incrementing integer namespaced to each machine
The confusion I see many people have is assuming UUID is random. That's only true for v4, and choosing that has unfortunately costs. There are many times where you want some determinism (v3, v5, v7) because it organizes your data (for perf or collision guarantees).
andersmurphy

A random UUID (or any random values in a uniform distribution) on an auxiliary key is still going to slow down your inserts as although it's not the clustered index it's still random inserts into a B-tree for that index.