The perils of UUID primary keys in SQLite
14 points by andersmurphy
14 points by andersmurphy
Nice. It would be interesting to see the numbers for rowid tables where the integer key is random rather than monotonically increasing.
One thing the article doesn't mention, but I think it's important, is that the primary index on rowid tables is a B+-tree, while for “without rowid” tables it is a B-tree. So, generally speaking, the latter is not ideal when the average record's size exceeds a certain threshold because the internal nodes of the index store full records (iirc, SQLite's manual suggests 1/20th of the page size as a rule of thumb).
Version 7 UUIDs have a 48-bit timestamp at the head, so they aren't randomly distributed like this. That will also help with the excessive paging and rebalancing.
Are people really using uuids as their primary keys? What’s the benefit rather than having the uuid as an auxiliary key for when you want a uuid?
It's for scale. If you want to generate a unique ID in a distributed manner (many computers, many datacenters around the world) at a high rate (e.g., S3 uploads) you don't want to lock on a single incrementing integer. Locks are slow to sync globally.
GUID and UUIDs solve this by their construction.
The confusion I see many people have is assuming UUID is random. That's only true for v4, and choosing that has unfortunately costs. There are many times where you want some determinism (v3, v5, v7) because it organizes your data (for perf or collision guarantees).
A random UUID (or any random values in a uniform distribution) on an auxiliary key is still going to slow down your inserts as although it's not the clustered index it's still random inserts into a B-tree for that index.