Australian Health Information Technology: Are You Confident That When They Hand Over Your Health Information In Anonymised Form It Really Is Anonymised?

Wednesday, July 31, 2019

Are You Confident That When They Hand Over Your Health Information In Anonymised Form It Really Is Anonymised?

Some articles that appeared this week might shake your confidence.

First we have:

'Anonymised' data can still be used to identify you, scientists show

By Gina Kolata

July 24, 2019 — 11.26am

Your medical records might be used for scientific research. But don't worry, you're told; personally identifying data was removed.

Information about you gathered by a government bureau might be made public. But don't worry; it, too, has been "anonymised."

This week, scientists showed that all this information may not be as anonymous as promised. The investigators developed a method to re-identify individuals from just bits of what were supposed to be anonymous data.

Data can be anonymised in various ways, but keeping it useful often means leaving it open to reconstruction.

In most of the world, anonymous data is not considered personal data; the information can be shared and sold without violating privacy laws. Market researchers are willing to pay brokers for a huge array of data, from dating preferences to political leanings, household purchases to streaming favourites.

Even anonymised data sets often include scores of so-called attributes; characteristics about an individual or household. Anonymised consumer data sold by Experian, the credit bureau, to Alteryx, a marketing firm, included 120 million Americans and 248 attributes per household.

Scientists at Imperial College London and Université Catholique de Louvain, in Belgium, reported in the journal Nature Communications that they had devised a computer algorithm that can identify 99.98 per cent of Americans from almost any available data set with as few as 15 attributes, such as gender, postal code or marital status.

Even more surprising, the scientists posted their software code online for anyone to use. That decision was difficult, says Yves-Alexandre de Montjoye, a computer scientist at Imperial College London and lead author of the new paper.

Ordinarily, when scientists discover a security flaw, they alert the vendor or government agency hosting the data. But there are mountains of anonymised data circulating worldwide, all of it at risk, de Montjoye says.

So the choice was whether to keep mum, he said, or to publish the method so that data vendors can secure future data sets and prevent individuals from being re-identified.

"This is very hard," de Montjoye says. "You have to cross your fingers that you did it properly, because once it is out there, you are never going to get it back."

Some experts agreed with the tactic. "It's always a dilemma," says Yaniv Erlich, chief scientific officer at MyHeritage, a consumer genealogy service, and a well-known data privacy researcher.

"Should we publish or not? The consensus so far is to disclose. That is how you advance the field: publish the code, publish the finding."

This not the first time that anonymised data has been shown to be not so anonymous after all. In 2016, individuals were identified from the web-browsing histories of 3 million Germans, data that had been purchased from a vendor. Geneticists have shown that individuals can be identified in supposedly anonymous DNA databases.

Very quickly, with a few bits of information, everyone is unique

Yaniv Erlich

……

New York Times

More here:

https://www.smh.com.au/technology/anonymised-data-can-still-be-used-to-identify-you-scientists-show-20190724-p52a5q.html

There was lots of coverage of the same Nature article here:

Easily re-identified 'anonymised' data threatens privacy

By Juha Saarinen on Jul 25, 2019 6:38AM

Gaussian copula could trip GDPR trap.

Researchers have once again shown that sensitive data, supposedly anonymised so as not to reveal its subjects, can be re-constituted with relative ease.

Data scientists from London's Imperial College and the Université Catholique de Louvain in Belgium had a crack at estimating the likelihood of a specific person being correctly re-identified in even heavily incomplete, anonymised datasets.

Their Gaussian copula-based method turned out to be very accurate.

"Using our model, we find that 99.98 percent of Americans would be correctly re-identified in any dataset using 15 demographic attributes.

"Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymisation set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model," the researchers wrote in the Nature Communications scientific journal.

Lots more here:

https://www.itnews.com.au/news/easily-re-identified-anonymised-data-threatens-privacy-528657

There are also a good article on the issue found here:

https://www.computerworld.com.au/article/664459/machine-learning-can-find-heavily-sampled-anonymised-dataset/

To me, it is really up to those handing the ‘anonymised’ data to prove, to a reasonable level, that your identity is protected. It seems in 2019 this is a pretty big ask indeed.

For myself I am going to tick the no-sharing box for now!

David.

Australian Health Information Technology

Quote Of The Year

Timeless Quotes - Sadly The Late Paul Shetler - "Its not Your Health Record it's a Government Record Of Your Health Information"

H. L. Mencken - "For every complex problem there is an answer that is clear, simple, and wrong."