Re-Identification Risk vs k-Anonymity
3 points by Mbat
3 points by Mbat
Many years ago I was heavily involved in research on the reconstruction of identity and behavior using anonymized entity data, ambient event data that is not associated with any entity, and sparse spatiotemporal graphs. People have no concept of the extent to which innocuous data can be used to reconstruct facts that seem like they should be impossible to know from that data.
It is deeply unintuitive. I did a famous demo circa 2012 on real public data where I placed a person at a location at a specific time with no evidence in the data that the person had ever been there. It sounds impossible on its face, but if you step someone through how these types of graphs are analytically reconstructed it is not too difficult to understand why it works.
Many anonymity guarantees only hold if the adversary has no additional data sets. What surprises people is that the additional data can have nothing to do with the entities in question. People tend to assume that you need two different data sets that contain the entities to do this. While that is one way to go about it, it is unnecessary with sufficiently sophisticated graph analytic capability.
The same analytical algorithms are central to doing physical world observability at large geographic scales. Fun stuff. There is very little literature on it.