Quote Of The Year

Quotes Of The Year - Paul Shetler - "Its not Your Health Record it's a Government Record Of Your Health Information"


H. L. Mencken - "For every complex problem there is an answer that is clear, simple, and wrong."

Wednesday, October 25, 2017

It Seems Data Anonymisation Is Really Hard To Get Right While Making Available Any Data That Is Useful.

This appeared last week:

Has somebody shared your 'anonymised' health data? Bad news

Harvard boffins unmask 100% of 'encrypted' S Korean records

By Alexander J Martin
Researchers from Harvard University have published a paper claiming a 100 per cent success rate in de-anonymising patients from their supposedly anonymised healthcare data in South Korea.
The study, which bears the ronseal title of "De-anonymizing South Korean Resident Registration Numbers Shared in Prescription Data", was published this week in Technology Science.
Two de-anonymisation experiments were conducted in the study on prescription data from deceased South Koreans, with encrypted national identifiers - Resident Registration Numbers (RNN) - included.
The researchers found significant vulnerabilities in the anonymisation process which is applied to identifiers contained within prescription data, data which is often sold to multinational health companies.
The RNNs, similar to Blighty's National Insurance numbers, are unique 13-digit codes which represent demographic information.
Finding that "weakly encrypted RRNs" may be vulnerable to de-anonymisation, both experiments were 100 per cent successful, and revealed all 23,163 of the unencrypted RNNs.
Each experiment was conducted independently of the other, and provided each other with complementary validations as the boffins were able to match the same RRNs to the same patients in both experiments. Both are detailed in the journal article's methodology section.

Scandalous slurpage

The Harvard experiment used decedent information, however the experiments demonstrated "how others could associate an actual RRN for a living patient with his sensitive medical information."
The researchers noted a civil suit involving multinational IMS Health, regarding its access to that sensitive information. IMS Health claims to possess over 10PBs of "unique healthcare data", and is one of the largest vendors of physician prescribing data worldwide.
Confirmation of the suit is provided through an internal IMS Health document (PDF, pg. 14) which alleges an "affiliate collected plaintiffs' personal information without the necessary consent in violation of applicable privacy laws and transferred such information to IMS Korea for sale to customers."
If IMS Health did receive these kinds of data, then our study exposes the real-world realities of imperfect anonymization. Further research is necessary to propose alternatives to lawsuits.
Lots more here:
Until researchers of this calibre are able to say the Department of Health / Human Services actually know what they are doing and that the data being released is unable to be ‘reverse engineered’ I would stay well away. The South Koreans are hardly technical ‘nit-wits’!
I far prefer the situation where known researchers are given controlled access to proper data-sets under properly regulated and ethically approved conditions, do the work they need to do and then destroy / hand-back the original data and only publish consolidated summary conclusions.
This way the risks are much lower I believe.
Comments welcome!


Anonymous said...

You make very valid points David and there are alternatives to just freely giving access to raw data. Meanwhile I see history repeating itself - http://theconversation.com/foi-reveals-cynical-logic-that-compromises-nhs-data-privacy-24750

Peter said...

When I worked at Telstra they had an interesting way of providing data for testing which could not be used for any other purpose. Basically certain values were shuffled between records - so addresses, names, services, accounts etc. were real but didn't match the other values in any specific record. Of course, there was some logic involved to make sure things correlated properly - the count of services matched for instance.
I don't think the technique is suitable for health records because in medicine everything is inter-connected - and often it is a combination of values that matters rather than any specific one. However, it does show simple anonymisation is not the only way to render the data "safe" for external use.

Anonymous said...

Another way to avoid identification is to only allow pre-programmed queries that output aggregated statistics. Aggregated statistics (large groups) and no line data means no ability for data matching. Specialised queries have to go through ethics and be done by the data holder (MyEHR). No matter what, no line data leaves the building.

Anonymous said...

Even aggregated queries at population level require Ethics clearance if conducted for research purposes. The controls on researchers are very tight. Not so on industry and government who don't have ethics committees ....