The GDPR impact on machine learning

Maybe you have noticed it. Privacy is an issue. A bit strange since there are only 7 days left until the new EU General Data Protection Regulation (GDPR) will become fully enforceable throughout the European Union.

So before end of May 2018 all organizations that process data of EU citizens must comply with this General Data Protection Regulation. Determining how to design and improve your systems to meet the GDPR is not straightforward. If your are thinking of using new machine learning technologies you could face now an extra challenge.

Since we live in a digital world your digital traces are everywhere. And most of the time we are fully unaware. In most western countries mass digital surveillance cameras generate great data to be used for machine learning algorithms. This can be noble by detecting diseases based on camera images, but all nasty use cases thinkable are of course also under development.

The applicability of machine learning, is hindered if you follow the GDPR guidelines. Machine learning raises serious privacy concerns since machine learning is using massive amount of mostly data that contain personal information.

It is a common believe that personal information is needed for experimenting with machine learning before you can create good and meaningful applications. E.g. for health applications, travel applications, eCommerce and of course marketing application. Machine learning models can be loaded with massive amounts of personal data for training and to make in the end good meaningful predictions. The belief that personal data is needed for machine learning creates a tension between developers and privacy aware consumers. Developers want the ability to create innovative new products and services and need to experiment, while consumers and GDPR regulators are concerned for the privacy risks involved.

There is a solution under development: SecureML. Secure machine learning (Secure ML) should deal with some of the privacy concerns. But secure ML is still an obscure and unpaved road to go. Secure ML works on encrypted data which has a lot of consequences for machine learning data preparations, data cleaning and how machine learning models can be trained. Machine learning itself is already a black box when it comes to privacy concerns and the traceability of the original used data sources. With SecureML it will even be harder to validate the output of trained models and understand how algorithms and tools exactly work.

The GDPR does not prohibit the use of machine learning. But when you use personal data you will have a severe challenge to explain to DPOs (Data Protection Officers) and consumers what you actually do with the data and how you comply with the GDPR.

And do not forget: What do you do with your trained machine learning model when an user ask to his personal data to be erased? Under the GDPR this is a legal right outlined in article 17 of the GDPR (‘right to be forgotten’).

This blog post will be added (after rewrite) as an extension on the ‘Open Reference Architecture for Security and Privacy‘. We are working on an renewed version. Please join us!