July 25, 2019

Your ‘Anonymous’ Data Shouldn’t Be Able to Re-Identify You—But It Can

by Samra Anees

Recently, the Nature Communications journal published a paper called “Estimating the success of re-identifications in incomplete datasets using generative models;” a study that developed a model using machine learning that found that 99% of Americans could be re-identified from anonymous data sets.

Your 'Anonymous' Data Can Be Used to Identify You

Machine learning exposed that anonymous data can still be used to identify individuals.

This blog explores how and why data anonymization isn’t enough to protect consumer privacy.

The Model Sheds Light On Current Anonymization Practices

The three researchers who conducted the study aimed to create a system that would estimate the likelihood that a person could be re-identified from an anonymous dataset that includes demographic information about people.

The researchers succeeded in creating this model and found that over 99% of Americans could be re-identified from any given dataset by using just 15 demographics (things like gender, age, marital status, date of birth, etc.)

They found that the results beg the question of whether or not these current data anonymization practices are tight enough to comply with modern data protection laws, like GDPR and CCPA.

Increased Focus on Data Protection Law—GDPR and CCPA

The GDPR was implemented to provide EU citizens with more consistent protection over consumer and personal data. GDPR requirements benchmark the processing and movement of data in a way that protects citizens and their data; among requirements like needing the consent of a subject to process their data and requiring certain companies to hire a GDPR compliance officer, one of those guidelines is the requirement to anonymize the collected data so that the privacy of the subject is preserved.

Inspired by the GDPR, the CCPA aims to give Californians more rights when it comes to their data. The CCPA requires that companies disclose to consumers what data they have on them, why it is collected, how they collected it, and what they are doing with it, and the consumer has the right to demand that the data be deleted or not sold (“opt-out” or ” Do Not Sell” rule). This law will implement these crucial rights starting January 1st, 2020. Other U.S. states have taken inspiration from the CCPA and begun implementing their own versions as well.

As we have seen too often, massive amounts of personal and consumer data are available too readily to too many tech giants; we all know that all of our data is being collected and sold to third parties every day. As this becomes increasingly apparent, data protection laws are gaining traction and becoming more and more important as people step up and try to gain back rights to their data and their privacy.

Bottom Line

This study on data anonymization is yet another example of the irresponsible practices of data collection taking place today. With the expanse of data collection and databases available on people, it feels as though nothing is private anymore. Data privacy laws will start to combat this, but realizing that consumer information is not protected after the fact is no longer enough; data collectors need to take individuals’ privacy into consideration before it can be used incorrectly. Data collection is important when intended to be used for positive impacts, but risking the privacy of citizens and anonymization can no longer be a trade-off—especially when people are not even aware of how information can be used or linked back to them.

Your ‘Anonymous’ Data Shouldn’t Be Able to Re-Identify You—But It Can

The Model Sheds Light On Current Anonymization Practices

Increased Focus on Data Protection Law—GDPR and CCPA

Bottom Line

ABOUT Adam Pease

Have a Comment on this? Cancel reply