Microdata privacy enhancing

Removal of potentially unique IDs

Basic strategy

  • By removing potentially unique IDs we cannot link microdata items from several databases

Candidate IDs

  • Name

  • National IDs (passport, identity card, etc.)

  • Social Security ID, Tax ID, etc.

  • Phone numbers

  • Car plate numbers

Not enough!

  • A study in the States proved that 87% of its the population could be identified using a link attack using 3 non-unique attributes

    • 5-digit ZIP code, gender and birthday


Basic strategy

  • Add noise to stored data or to the result of queries


  • Privacy is achieved at the cost of integrity


Basic strategy

  • Do not provide full data, limiting precision


  • Privacy is achieved at the cost of integrity

  • Difficult to balance usability with privacy

    • Privacy relates to the user providing information, which may not be the user accessing information



  • No query can deliver an anonymity set with less than k entries

Privacy-critical attributes

  • (Unique) identifiers

  • Quasi-identifiers

    • When combined can produce unique tuples

  • Sensitive attributes

    • Potentially unique per subject

    • Disease, salary, crime committed

Implementation approaches

Suppression of quasi-identifiers

  • Simple to perform

  • Information loss

Generalization of quasi-identifiers

  • Transformation of quasi-identifiers in other ones less specific

    • e.g. 7-digit ZIP → 4-digit ZIP

    • e.g. ages w/ 1 year granularity → 5 or 10 year granularity

  • There is not a complete loss of information

    • But the generalization should not potentiate wrong data interpretations

  • We must ensure that there are at least k entries with equal generalized quasi-identifiers



Quasi identifiers

Sensitive attributes


1st step: Remove unique identifiers

2nd step: Generalization

2-anonymity possible results

3-anonymity possible results

Sensitive attribute disclosure

Last updated