GDPR: Pseudonymization or Anonymization

Maybe you have noticed it. Privacy is an issue. A bit strange since there are only 18 days left until the new EU General Data Protection Regulation (GDPR) will become fully enforceable throughout the European Union.

So before end of May 2018 all organizations that process data of EU citizens must comply with this General Data Protection Regulation. Determining how to handle the GDPR is not straightforward when dealing with data masking. A question relevant to comply with the GDPR is if you should use:

Anonymization or
Pseudonymization

To mask personal data in your IT landscape.

According to the GDPR ‘pseudonymization’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person. So Pseudonymization is a method to substitute identifiable data with a reversible, consistent value. So the weakness is that personal data is still there, only a bit more difficult to get if you have no information on the used pseudonymization rules.

Pseudonymization of personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet GDPR obligations. But pseudonymization of data is in general a weak process to protect data privacy. Pseudonymization substitutes only the identity of the data subject in such a way that additional information is required to re-identify the data subject. A better approach to protect private data is to use data anonymization.

Data anonymization is the process of either encrypting or removing personally identifiable information from data sets, so that the private personal data remain anonymous. Real anonymization is irreversibly and destroys permanent any option of identifying the data subject.

Using pseudonymization introduces a large number of risks that are not present when using anomyzation. However in some use cases you can only use pseudonymization. But use it with care, since the technical and organizational risks involved with pseudonymization are significant.

So the principle stated for your privacy by design solution architecture should be:

Statement: The use of anomyzation is preferred when dealing with private data.
Rationale: Easier to comply with GDPR.
Implication(s): Only for business processes where data anomyzation is not possible data pseudonymization should be considered.

This blog post will be added (after rewrite) as an extension on the ‘Open Reference Architecture for Security and Privacy‘. We are working on an renewed version. Please join us!