Metadata Anonymisation

Maybe you have noticed it. Privacy is an issue. A bit strange since there are only 13 days left until the new EU General Data Protection Regulation (GDPR) will become fully enforceable throughout the European Union.

So before end of May 2018 all organizations that process data of EU citizens must comply with this General Data Protection Regulation. Determining how to design and improve your systems to meet the GDPR is not straightforward. A complicating factor is that in many documents you store and process private data can be hidden without you knowing. When you want to be compliant with the GDPR you should know how to deal with metadata hidden in documents.

What is metadata?
Metadata consists of information that characterizes data (e.g. Word documents, pictures, music files, etc). In essence, metadata answers who, what, when, where, why, and how about every facet of the data that is being characterized. Metadata within a file can contain a lot of privacy related data.  Office documents like pdf or MSOffice automatically add author, company information and revision information(e.g. who changed what)  to documents and spreadsheets. Under the GDPR you are not by default allowed to disclose this metadata information on the web.

When you distribute information or publish information on the Internet you must check if metadata in document is still present and if you are allowed to expose this information. In most cases you do not want the metadata exposed. To solve this problem a lot of tools exist that claim to strip all metadata for you from Office Documents (MSWord, PDF, etc). However in most cases these tools work far from perfect and give you a false feeling of security. For examples, images embedded inside PDF documents may not be cleaned and images also contain metadata information.

If you really want to be sure that no metadata is present in documents convert documents to a format that do not contain any metadata: E.g. plain-text document. But be aware and very careful: every format can be watermarked, so also even plain text documents! E.g. by using white space steganography.  Steganography is the science of concealing messages in other messages. In this digital age with a lot of companies and governments following your communication, steganography provides still a good way for hiding messages.

To meet the GDPR you should be aware of the risk of exposing personal information by metadata in documents. So make use of metada anonymisation.

This blog post will be added (after rewrite) as an extension on the ‘Open Reference Architecture for Security and Privacy‘. We are working on an renewed version. Please join us!