Exploring PostgreSQL 18's new UUIDv7 support
9 points by eduard
9 points by eduard
I learnt a couple of new things with this article; I had never thought about the privacy concerns related to using UUIDv7 as an externally facing key.
Many people think the risks are limited to the German Tank Problem when discussing monotonic identifiers, which is relatively intuitive but that is arguably the least interesting risk. There is an entire category of analytical graph reconstruction techniques that exploit sparse ordered events to infer things that people assume it should not be possible to infer from that data because it violates common intuition.
That sounds interesting, can you give an example?
One of my favorite examples of something that sounds similar is Trajectory recovery from Ash: User privacy is NOT preserved in aggregated mobility data.
Btrees are kinda dumb for UUIDs, so I personally think V7 UUIDs are a neat hack. But I also use a lot of deterministic UUIDs. What I’d love is first class support for hash indexes in PG, including multi column unique/primary keys, which would obviate the problems with non sequential keys of any kind. Maybe one day!
I’m also curious about the risk of collisions if you mix V7 and random/hash UUIDs in a single key?
What I’d love is first class support for hash indexes in PG
https://www.postgresql.org/docs/current/indexes-types.html#INDEXES-TYPES-HASH ?
The problem with non-sequential keys is the random IO, and hashing does not help with that.
I’m also curious about the risk of collisions if you mix V7 and random/hash UUIDs in a single key?
It’s zero because UUIDv7 and UUIDv4 have different numbers in their version fields.
Yes - that’s why I said “first class support” - wouldn’t it be great if hash keys could be used for unique indexes? I mean, in theory hash indexes are faster than btrees.
Index queries are random regardless of the key structure. AIUI the problem with non sequential key insertion is that the btree becomes fragmented so you end up having to read more index pages when doing a lookup. Also, frequent rebalancing is needed,
AFAIK a hash index is more or less O(1) regardless of the key.
You are right about the version field though.