Wednesday, October 25, 2017
It Seems Data Anonymisation Is Really Hard To Get Right While Making Available Any Data That Is Useful.
This appeared last week:
By Alexander J Martin 2 Oct 2015 at 08:34
Researchers from Harvard University have published a paper claiming a 100 per cent success rate in de-anonymising patients from their supposedly anonymised healthcare data in South Korea.
The study, which bears the ronseal title of "De-anonymizing South Korean Resident Registration Numbers Shared in Prescription Data", was published this week in Technology Science.
Two de-anonymisation experiments were conducted in the study on prescription data from deceased South Koreans, with encrypted national identifiers - Resident Registration Numbers (RNN) - included.
The researchers found significant vulnerabilities in the anonymisation process which is applied to identifiers contained within prescription data, data which is often sold to multinational health companies.
The RNNs, similar to Blighty's National Insurance numbers, are unique 13-digit codes which represent demographic information.
Finding that "weakly encrypted RRNs" may be vulnerable to de-anonymisation, both experiments were 100 per cent successful, and revealed all 23,163 of the unencrypted RNNs.
Each experiment was conducted independently of the other, and provided each other with complementary validations as the boffins were able to match the same RRNs to the same patients in both experiments. Both are detailed in the journal article's methodology section.
The Harvard experiment used decedent information, however the experiments demonstrated "how others could associate an actual RRN for a living patient with his sensitive medical information."
The researchers noted a civil suit involving multinational IMS Health, regarding its access to that sensitive information. IMS Health claims to possess over 10PBs of "unique healthcare data", and is one of the largest vendors of physician prescribing data worldwide.
Confirmation of the suit is provided through an internal IMS Health document (PDF, pg. 14) which alleges an "affiliate collected plaintiffs' personal information without the necessary consent in violation of applicable privacy laws and transferred such information to IMS Korea for sale to customers."
If IMS Health did receive these kinds of data, then our study exposes the real-world realities of imperfect anonymization. Further research is necessary to propose alternatives to lawsuits.
Lots more here:
Until researchers of this calibre are able to say the Department of Health / Human Services actually know what they are doing and that the data being released is unable to be ‘reverse engineered’ I would stay well away. The South Koreans are hardly technical ‘nit-wits’!
I far prefer the situation where known researchers are given controlled access to proper data-sets under properly regulated and ethically approved conditions, do the work they need to do and then destroy / hand-back the original data and only publish consolidated summary conclusions.
This way the risks are much lower I believe.
Posted by Dr David G More MB PhD at Wednesday, October 25, 2017