'Anonymous' data might not be so anonymous, study shows
'Anonymous'
data might not be so anonymous, study shows
The
researchers’ model suggests that more than 99 percent of Americans could be
correctly re-identified from any dataset using 15 demographic attributes ributes.
By Nicholas Wells, CNBC and Leslie Picker, CNBC July
23, 2019, 12:20 PM PDT
We’ve all done it: When signing up for an account
online, we’ve clicked “I agree” to have our data sold to third parties. It will
be anonymized, we’re assured, and only a small percentage of data will be made
available to others.
But how secure can we be that our personal data can’t be traced
back to us? That’s the central question that a team of researchers at
Université catholique de Louvain in Belgium and Imperial College London sought
to answer.
The conclusion is — “not very.”
Using machine learning, the researchers developed a system to
estimate the likelihood that a specific person could be re-identified from an
anonymized data set containing demographic characteristics. The researchers’
model suggests that more than 99 percent of Americans could be correctly
re-identified from any dataset using 15 demographic attributes, including age,
gender and marital status.
“While there might be a lot of people who are in their thirties,
male and living in New York City, far fewer of them were also born on January
5, are driving a red sports car and live with two kids (both girls) and one
dog,” said Luc Rocher, a PhD candidate at UniversitĂ© catholique de Louvain and
the study’s lead author. Personal data can be used for research, illicit
activities and even investing, as CNBC has previously reported.
Their paper, “Estimating the success of re-identifications in
incomplete datasets using generative models,” was published in the journal Nature
Communications. Their findings suggest that commonly used
anonymization tools like adding noise and sampling data may not be enough to
keep up with pro-data privacy laws like the European Union’s GDPR and California’s Consumer
Privacy Act.
The results “question whether current de-identification
practices satisfy the anonymization standards of modern data protection laws
such as GDPR and CCPA,” the researchers wrote.
As part of their research, the trio published
an online tool to help people understand how likely it is for them
to be re-identified, based on just three common demographic characteristics:
gender, birth date and ZIP code. On average, people have an 83% chance of being
re-identified based on those three data points, the researchers said.
“The goal of anonymization is so
we can use data to benefit society,” said Yves-Alexandre de Montjoye, one of
the researchers. “This is extremely important but should not and does not have
to happen at the expense of people’s privacy.”
Comments
Post a Comment