Wednesday, December 27, 2017
I Am Not Sure Why This Has All Of A Sudden Become News Again But It Sure Has!
This appeared last week:
Using publicly known information, a team of researchers from the University of Melbourne have claimed to re-identify seven prominent Australians in an open medical dataset.
The dataset containing historic longitudinal medical billing records of one-tenth of all Australians, approximately 2.9 million people, has been found to be re-identifiable by a team from the University of Melbourne, with information such as child births and professional sportspeople undergoing surgery to fix injuries often made public.
The team, consisting of Dr Chris Culnane, Dr Benjamin Rubinstein, and Dr Vanessa Teague, warned that they expect similar results with other data held by the government, such as Census data, tax records, mental health records, penal data, and Centrelink data.
"We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the record with known information about the individual such as medical procedures and year of birth," Dr Culnane said.
"This shows the surprising ease with which de-identification can fail, highlighting the risky balance between data sharing and privacy."
Although the released dataset has a two-week perturbation of dates of medical events, the team said increasing the perturbation would not make much difference, since in the case of re-identifying older mothers or unique procedures, for instance, there are very few data points to obscure.
"Overall, including the re-identifications from childbirths, sports injuries, and single surgeries, we devised 43 queries and found seven unique matches," the team said.
With additional data such as credit card history, it is increasingly easily to create fingerprints of people, and could be happening within private medical insurers and people would not know, the team said.
"A private health insurer (for example) could efficiently track the medical records of past customers through the decades of data, or derive extra information they didn't know about from current customers," they said. "This would be a clear breach of privacy that would possibly never be reported, even though the data could lead to detrimental decisions for the individual in the future."
The team warned that the problem with releasing datasets with personal information is that it could be used far off into the future with additional information from other sources to re-identify people.
"Data about people should be much more carefully considered," the team wrote. "It is very unlikely that even the most well-informed and well-intentioned set of guidelines on de-identification can guarantee privacy protections appropriate for sensitive data such as the MBS/PBS 10 percent sample while retaining the usefulness of the data."
"Taking advantage of the benefits of big data without seriously compromising privacy is one of the most difficult engineering challenges of our time."
The team said the Department of Health was notified of the problems with the dataset on December 2016.
In September 2016, the same dataset was found by the University of Melbourne team to not be encrypting supplier codes properly. The dataset was subsequently pulled down by the Department of Health.
"Leaving out some of the algorithmic details didn't keep the data secure -- if we can reverse-engineer the details in a few days, then there is a risk that others could do so too," the team said at the time.
"Security through obscurity doesn't work -- keeping the algorithm secret wouldn't have made the encryption secure, it just would have taken longer for security researchers to identify the problem.
"It is much better for such problems to be found and addressed than to remain unnoticed."
As a result of the issues found, in October last year, the Australian government proposed changes to the Privacy Act that would criminalise the intentional re-identification and disclosure of de-identified Commonwealth datasets, reverse the onus of proof, and be applied retrospectively applied from September 29, 2016.
There is also coverage here:
and the definitive blog here:
To me the major relevance of all this is that is appears at the same time as the Government is finalising its view on the Secondary Use of myHR data.
This mess up must surely up the ante as to the protections applied to any data release and also ensure there is a significant improvement in the competence of those handling such data.
I hope this is indeed the case – I for one will be watching closely.
Posted by Dr David G More MB PhD at Wednesday, December 27, 2017