Anonymization of data does not guarantee your complete anonymity

Tomcat

Professional
Messages
2,379
Reputation
4
Reaction score
407
Points
83
There is an opinion that “anonymized” information, which many companies love to collect and use, will not actually protect a person from de-anonymization if the data suddenly leaks online or is used in someone else’s interests. Cloud4Y tells you if this is true.

Last fall, Adblock Plus founder Vladimir Palant analyzed the products Avast Online Security, AVG Online Security, Avast SafePrice and AVG SafePrice and concluded that Avast uses its popular antivirus software to collect and subsequently sell user data. The hype quickly faded away, because Avast CEO Ondrej Vlcek convinced users that the collected data was anonymized as much as possible, that is, devoid of any connection to the identity of a specific person.

“Our company does not allow advertisers or third parties to access through Avast or any data that would allow third parties to target a specific individual,” he said.

However, a study conducted by students at Harvard University shows that depersonalization of collected information is far from a guarantee of protection against “de-anonymization,” that is, disclosure of a person’s identity based on data available in the database. Young scientists have created a tool that combs through huge amounts of consumer data sets that have become publicly available as a result of negligence, hacking or other types of leakage.

The program was fed all the databases that had leaked online since 2015. This includes MyHeritage account data, Equifax, Experian user data, etc. Despite the fact that many of these databases contain “anonymized” information, students say that identifying real users was not that difficult.

The principle of operation is quite simple. The program takes a list of personally identifying information (email or name of a person), and then scans all leaked databases for information that matches the specified parameters. If matches are found, then students receive more information about the person. And sometimes this information is enough to clearly identify him.

Collecting the pieces of your personality​

An individual leak is like a puzzle piece. On its own, it's not particularly useful, but when multiple leaks are compiled into a single database, it can provide a surprisingly clear picture of our personalities. People may forget about these leaks, but hackers have the opportunity to use this data long after. We just need to put together a few more pieces of the puzzle.

Imagine, while one company might only store usernames, passwords, email addresses, and other basic account information, another company might store information about your browsing and searches, or your location data. On its own, this information will not identify you, but when combined, it can reveal numerous personal details that even your closest friends and family may not know.

The goal of the student’s research is to show that such data collection, no matter how depersonalized it is, still poses a potential threat to users. A set of data from one source can be easily linked to another through a line that is present in both sets. That is, you should not think that your personal information is safe just because the company collecting and storing the data assures that it is completely anonymized.

There is other evidence for this. For example, in one British study, scientists using machine learning were able to create a program that could correctly identify 99.98% of Americans in any anonymous data set using only 15 characteristics. Another study conducted by representatives of the Massachusetts Institute of Technology showed that users can be identified 90% of the time if only four basic parameters are used.

It turns out that individual information leaks are quite painful, but taken together they become a real nightmare.

The problem is not just with companies​

But the companies alone should not be blamed. Despite numerous scandals involving confidential data leaks, which have become an almost weekly occurrence, the public greatly underestimates the impact of these leaks and hacks on personal security. Therefore, it ignores basic security measures. Thus, after analyzing one of the program's output data sets, students from Harvard found that out of 96,000 passwords contained in the database, only 26,000 were unique.

That is, people are simply too lazy to come up with something complex using template passwords. Here, for example, is a recent publication on Habré on this topic. The leading passwords are “12345” and “123456”. With such protection, no technology can save you from hacking. It is difficult to protect a person's data if he himself does not make any effort to do so.

There is a nuance: in Russia there are “Methodological recommendations for the application of Roskomnadzor Order No. 996 of September 5, 2013 “On approval of requirements and methods for anonymization of personal data” (approved by Roskomnadzor on December 13, 2013). These recommendations allow you to achieve a truly high level of anonymization. And if you don’t save on this procedure, replacing your full name with ID (everyone probably remembers how miraculously the names of the children of the former Prosecutor General of Russia, Artyom and Igor Chaika, turned into codes LSDU3 and YFYAU9 in Rosreestr .

What else can you add about the importance? The use of unique passwords has already been said so much that there is no point in repeating. And companies will continue to collect data, reassuring us with promises to make everything as anonymous as possible. But, as you can see, these promises cannot always be trusted.
 
Top