Microdata privacy enhancing
Removal of potentially unique IDs
Basic strategy
By removing potentially unique IDs we cannot link microdata items from several databases
Candidate IDs
Name
National IDs (passport, identity card, etc.)
Social Security ID, Tax ID, etc.
Phone numbers
Car plate numbers
Not enough!
A study in the States proved that 87% of its the population could be identified using a link attack using 3 non-unique attributes
5-digit ZIP code, gender and birthday
Noise
Basic strategy
Add noise to stored data or to the result of queries
Issues
Privacy is achieved at the cost of integrity
Truncate
Basic strategy
Do not provide full data, limiting precision
Issues
Privacy is achieved at the cost of integrity
Difficult to balance usability with privacy
Privacy relates to the user providing information, which may not be the user accessing information
K-anonymity
Definition
No query can deliver an anonymity set with less than k entries
Privacy-critical attributes
(Unique) identifiers
Quasi-identifiers
When combined can produce unique tuples
Sensitive attributes
Potentially unique per subject
Disease, salary, crime committed
Implementation approaches
Suppression of quasi-identifiers
Simple to perform
Information loss
Generalization of quasi-identifiers
Transformation of quasi-identifiers in other ones less specific
e.g. 7-digit ZIP → 4-digit ZIP
e.g. ages w/ 1 year granularity → 5 or 10 year granularity
There is not a complete loss of information
But the generalization should not potentiate wrong data interpretations
We must ensure that there are at least k entries with equal generalized quasi-identifiers
Examples
Identifiers
Quasi identifiers
Sensitive attributes
K-anonymity
1st step: Remove unique identifiers
2nd step: Generalization
2-anonymity possible results
3-anonymity possible results
Sensitive attribute disclosure
Last updated