L-Diversity

K-anonymity is not enough!

Homogeneity attack

  • The attacker knows the generalized Quase-Identificiers of a target

  • A query reveals the exact same sensitive attributes

  • The attacker gets the sensitive attribute of the target

  • Issue: lack of diversity in the results

Background knowledge attack

  • The attacker can filter out query results using known information

Solution

l-diverse k-anonymity

Results from a k-anonymity result of a query must contain l different values for each sensitive attribute

l-diversity

2-anonymity 1-diversity results

2-anonymity 2-diversity results

k-anonymity and l-diversity have flaws

k-anonymity: each equivalence class has at least k records to protect against identity disclosure.

  • k-anonymity is vulnerable to homogeneity attacks and background knowledge attacks.

Attacks on k-anonymity

Homogeneity attack

  • Bob is a 27-year old man living in zip code 47678 and Bob’s record is in the table.

  • So Bob corresponds to one of the first three records and must have heart disease.

Background knowledge attack

  • Carl is a 32-year old man living in zip code 47622. Therefore he is in the last equivalence class in Table 2.

  • If you know that Carl has a low risk for heart disease then you can conclude that Carl probably has cancer.

l-diversity: distribution of a sensitive attribute in each equivalence class has at least l “well represented” values to protect against attribute disclosure.

l-diversity is vulnerable to skewness attacks and similarity attacks.

  • Skewness: keeping diverse groups may change statistical properties

  • Similarity: similar concepts are not handled

Attacks on l-diversity

Similarity Attack:

  • Table 4 anonymizes table 3. Its sensitive attributes are Salary and Disease.

  • If you know Bob has a low salary (3k-5k) then you know that he has a stomach related disease.

  • This is because l-diversity takes into account the diversity of sensitive values in the group, but does not take into account the semantical closeness of the values.

10,000 records about a virus that affects 1% of the population.

Skewness attack: with 2-diversity we have an equal number of positive and negative records.

This gives everyone in this equivalence class a 50% chance of having the virus, which is much higher than the real distribution.

Last updated