L-Diversity
Last updated
Last updated
K-anonymity is not enough!
Homogeneity attack
The attacker knows the generalized Quase-Identificiers of a target
A query reveals the exact same sensitive attributes
The attacker gets the sensitive attribute of the target
Issue: lack of diversity in the results
Background knowledge attack
The attacker can filter out query results using known information
Results from a k-anonymity result of a query must contain l different values for each sensitive attribute
k-anonymity: each equivalence class has at least k records to protect against identity disclosure.
k-anonymity is vulnerable to homogeneity attacks and background knowledge attacks.
Homogeneity attack
Bob is a 27-year old man living in zip code 47678 and Bob’s record is in the table.
So Bob corresponds to one of the first three records and must have heart disease.
Background knowledge attack
Carl is a 32-year old man living in zip code 47622. Therefore he is in the last equivalence class in Table 2.
If you know that Carl has a low risk for heart disease then you can conclude that Carl probably has cancer.
l-diversity: distribution of a sensitive attribute in each equivalence class has at least l “well represented” values to protect against attribute disclosure.
l-diversity is vulnerable to skewness attacks and similarity attacks.
Skewness: keeping diverse groups may change statistical properties
Similarity: similar concepts are not handled
Similarity Attack:
Table 4 anonymizes table 3. Its sensitive attributes are Salary and Disease.
If you know Bob has a low salary (3k-5k) then you know that he has a stomach related disease.
This is because l-diversity takes into account the diversity of sensitive values in the group, but does not take into account the semantical closeness of the values.
10,000 records about a virus that affects 1% of the population.
Skewness attack: with 2-diversity we have an equal number of positive and negative records.
This gives everyone in this equivalence class a 50% chance of having the virus, which is much higher than the real distribution.