Tuesday, October 10, 2017

A Very Long Read That Only Makes One More Concerned About The Safety And Reliability Of Critical Systems – In Healthcare and Elsewhere.

This appeared a little while ago.

The Coming Software Apocalypse

A small group of programmers wants to change how we code—before catastrophe strikes.
Lynn Scurfield
There were six hours during the night of April 10, 2014, when the entire population of Washington State had no 911 service. People who called for help got a busy signal. One Seattle woman dialed 911 at least 37 times while a stranger was trying to break into her house. When he finally crawled into her living room through a window, she picked up a kitchen knife. The man fled.
The 911 outage, at the time the largest ever reported, was traced to software running on a server in Englewood, Colorado. Operated by a systems provider named Intrado, the server kept a running counter of how many calls it had routed to 911 dispatchers around the country. Intrado programmers had set a threshold for how high the counter could go. They picked a number in the millions.
Shortly before midnight on April 10, the counter exceeded that number, resulting in chaos. Because the counter was used to generating a unique identifier for each call, new calls were rejected. And because the programmers hadn’t anticipated the problem, they hadn’t created alarms to call attention to it. Nobody knew what was happening. Dispatch centers in Washington, California, Florida, the Carolinas, and Minnesota, serving 11 million Americans, struggled to make sense of reports that callers were getting busy signals. It took until morning to realize that Intrado’s software in Englewood was responsible, and that the fix was to change a single number.
Not long ago, emergency calls were handled locally. Outages were small and easily diagnosed and fixed. The rise of cellphones and the promise of new capabilities—what if you could text 911? or send videos to the dispatcher?—drove the development of a more complex system that relied on the internet. For the first time, there could be such a thing as a national 911 outage. There have now been four in as many years.
It’s been said that software is “eating the world.” More and more, critical systems that were once controlled mechanically, or by people, are coming to depend on code. This was perhaps never clearer than in the summer of 2015, when on a single day, United Airlines grounded its fleet because of a problem with its departure-management system; trading was suspended on the New York Stock Exchange after an upgrade; the front page of The Wall Street Journal’s website crashed; and Seattle’s 911 system went down again, this time because a different router failed. The simultaneous failure of so many software systems smelled at first of a coordinated cyberattack. Almost more frightening was the realization, late in the day, that it was just a coincidence.
“When we had electromechanical systems, we used to be able to test them exhaustively,” says Nancy Leveson, a professor of aeronautics and astronautics at the Massachusetts Institute of Technology who has been studying software safety for 35 years. She became known for her report on the Therac-25, a radiation-therapy machine that killed six patients because of a software error. “We used to be able to think through all the things it could do, all the states it could get into.” The electromechanical interlockings that controlled train movements at railroad crossings, for instance, only had so many configurations; a few sheets of paper could describe the whole system, and you could run physical trains against each configuration to see how it would behave. Once you’d built and tested it, you knew exactly what you were dealing with.
Software is different. Just by editing the text in a file somewhere, the same hunk of silicon can become an autopilot or an inventory-control system. This flexibility is software’s miracle, and its curse. Because it can be changed cheaply, software is constantly changed; and because it’s unmoored from anything physical—a program that is a thousand times more complex than another takes up the same actual space—it tends to grow without bound. “The problem,” Leveson wrote in a book, “is that we are attempting to build systems that are beyond our ability to intellectually manage.”
The software did exactly what it was told to do. The reason it failed is that it was told to do the wrong thing.
Our standard framework for thinking about engineering failures—reflected, for instance, in regulations for medical devices—was developed shortly after World War II, before the advent of software, for electromechanical systems. The idea was that you make something reliable by making its parts reliable (say, you build your engine to withstand 40,000 takeoff-and-landing cycles) and by planning for the breakdown of those parts (you have two engines). But software doesn’t break. Intrado’s faulty threshold is not like the faulty rivet that leads to the crash of an airliner. The software did exactly what it was told to do. In fact it did it perfectly. The reason it failed is that it was told to do the wrong thing. Software failures are failures of understanding, and of imagination. Intrado actually had a backup router, which, had it been switched to automatically, would have restored 911 service almost immediately. But, as described in a report to the FCC, “the situation occurred at a point in the application logic that was not designed to perform any automated corrective actions.”
This is the trouble with making things out of code, as opposed to something physical. “The complexity,” as Leveson puts it, “is invisible to the eye.”
The attempts now underway to change how we make software all seem to start with the same premise: Code is too hard to think about. Before trying to understand the attempts themselves, then, it’s worth understanding why this might be: what it is about code that makes it so foreign to the mind, and so unlike anything that came before it.
Technological progress used to change the way the world looked—you could watch the roads getting paved; you could see the skylines rise. Today you can hardly tell when something is remade, because so often it is remade by code. When you press your foot down on your car’s accelerator, for instance, you’re no longer controlling anything directly; there’s no mechanical link from the pedal to the throttle. Instead, you’re issuing a command to a piece of software that decides how much air to give the engine. The car is a computer you can sit inside of. The steering wheel and pedals might as well be keyboard keys.
Vastly more here:
What to say. For myself, not having developed a system to fly a plane or operate a car lots of this was new and fascinating. The more I read the more I wondered just how far we were from ‘peak complexity’ as far as software was concerned and what happens next?
Surely we are going to have to increasingly rely on AI to evaluate and test virtually all critical systems and as I read I was reminded of the old coding / programming truth than all bug free programs were trivial and any non-trivial programs have bugs!
The next ten years are going to be very interesting as the issues raised in this article are faced and solved.
A great and worthwhile read!
David.

20 comments:

  1. There is a lack of appreciation of how hard it is to write reliable software and how complex the interactions are and if you don't standardize you inputs the output is unreliable. There seems to be a complete lack of appreciation by NEHTA-> ADHA about how important high quality standards based inputs are, and consequently our eHealth is unreliable and a disaster waiting to happen. There is NO testing or compliance required for the millions of critical health data messages that are sent around every day and in fact if they were compliant many endpoints would fail.

    I ask the question - would you fly in a plane if the ADHA had overseen the development of the software that controlled it? If not then we are allowing, no forcing, 24 million people to take joy flights on a plane that has had no compliance testing of its data inputs or ability to process correct inputs. We need people in charge of eHealth who deeply understand software, medicine and risk. Unfortunately we have not had this for so long that the software development aspects of medical software is not even front and center in many cases, its marketing that gets all the attention.

    Next time you or a relative is awaiting the result of a critical test just think about how compliant all the links in the chain that make that result appear actually are. But it'll be right, mostly....

    ReplyDelete
  2. The problem here is that complexity isn't going to go away, all we can do is move it around. In the end, it's deck chairs on the titanic. Using AI to review your software is definitely more of the same - against what requirements does it review your implementation? The requirements, and capturing them correctly, are the real source of complexity and confusion.

    The mainstream software industry is moving away from heavy-weight assurance processes towards highly reactive processes. I expect that the pendulum will swing backwards and forwards, but agile programming appears to have won, at least for now.

    Health is more of the same - complexity isn't going away.

    Andrew: please don't continue with the broken analogy of aeroplanes. Unless you want an industry controlled by something like the FAA. But on the whole, the airline industry analagies are wrong, and we don't want the healthcare industry to work like that.

    ReplyDelete
  3. We do need a lot more focus on software quality in healthcare however. Would you like aviation software to be controlled by DOHA/NEHTA/ADHA processes? Rigorous expectation wrt standards compliance is the governance we require and that requires more investment in software quality.

    ReplyDelete
  4. Grahame agile should not be an excuse to trade off quality, the quality of the product is paramount. You can trade off time, cost, and features but not quality. Programmers are and should remain only a part of the answer, agile is a discipline not an excuse to cut corners and it is a dangerous thing in the wrong hands. A surgeon trusts his tools because behind them is a discipline and quality process that is trusted, resulting in a subconscious acknowledgement that when using a scalpel it is sterile and sharpe, purposely crafted for a specific job.

    I certainly would not want my appendix taken out using a minimal viable product

    ReplyDelete
  5. I did not say anything about quality. Of course quality and safety is paramount.

    But I do not think that the way the airline industry achieves this is appropriate for healthcare: ignore potential risks until someone dies, then identify the cause, and legislate the cause out of existence. And if acting has any chance of failure that is not managed out of existence by following the book, then do not act. Healthcare requires a more nuanced approach, since the way we handle risk is fundamentally different.

    I agree that what we have is not what we'd like. But I don't know how to get a better approach across the industry. Anonymous might claim that you shouldn't trade off quality, but the market clearly does, as anyone who participates in it knows. And we also know that government attempts to change the market in this regard have generally made the situation worse, in spite of the best motivations and following recommendations.

    ReplyDelete
  6. Not sure what recommendations they received but have only seen "bribes to implement xyz" and attempts to implement own national program(s).

    IMHO it should be if you allow data to leave your system it must comply with these standards and you must reliably accept standards compliant data inbound.

    If we had compliant data that could be reliably received then you can build all sorts of systems on top of that.

    There have been NO attempts to ensure compliant data even when AHML existed and was able to do this job. It withered because no one bothered to achieve compliance, partially because compliant data didn't work with existing systems. They need HL7V2 with specific errors and would not process compliant data correctly.

    Healthcare needs compliant data flows, what happens inside the system is up to the user to judge.

    ReplyDelete
  7. I agree with Andrew’s sentiments. We have nationally and globally invested huge amounts of money over the years into eHealth. Organisations would soon adapt and adapt standards and specifications is we turned the money off, instead those holding our money bend over and supplement laziness and then brand those setting these standards and practices as blockers and nay sayers, academic dreamers etc.

    What do we end up with? A broken standards landscape with an ever depleting pool of highly skilled people who want to be involved which will very quickly result in the vacuum being filled by generic administrators and government personnel which will only further fuel the demise of our national institutes.

    Yes a new approach to conformance and compliance might need to be invested in but that is not going to happen while the ADHA is what it currently is.

    ReplyDelete
  8. Andrew: "There have been NO attempts to ensure compliant data even when AHML existed and was able to do this job" - yes, that was precisely my point, and that this was true under a wide set of conditions (except that it's not quite true - there were *some* attempts (including Andrew, yes), but they were few and didn't achieve critical mass). We're (nearly) all guilty on this.

    ReplyDelete
  9. Are standards bodies for health in Australia all that important anymore? Sure we need some variants around specific terminologies for medicines but on a whole most big end systems are either international or marketed towards international as well as national markets. The newly appointed COO of the ADHA has been at the forefront of pointing out that it is no longer a necessary investment as she did leading the demise of CCA requirements. Why not just join in with international efforts the are standards developments that prove online communities are as powerful if not more so than the old world ways.

    ReplyDelete
  10. "Why not just join in with international efforts the are standards developments that prove online communities are as powerful if not more so than the old world ways" - well, FHIR is an example that, presumably.

    It's important to understand that FHIR (and related efforts) are open online communities, and that's a key feature of what they are. But they are also formal standards efforts, and that underpins what they are - in the end, they will be formal standards. Having a way to scale governance - which is really what the standards process is - matters. And really, the standards community is just adapting it's social media processes to a new technology. It's not deeper and newer than that.

    So, yes, standards bodies still matter, but (a) they better be good at social media, and (b) they better be good at good governance. Standards Australia failed on both counts, and so we have no current accepted process here in Australia.

    And that matters; we can't just piggy-back on the back of international standards process as long as we have a notion of 'an Australian system' - a set of behaviours that we control for our own benefit. Obviously we need to collaborate/cooperate internationally, but we can't simply adopt the international consensus - particularly in health where the consensus is that countries have to finish the standards process themselves. So yes, standards bodies for health in Australia are that important. We just don't have any in e-health right now. We have some candidates (e.g. HL7 Australia, and the Agency) but none have demonstrated either ability to meet the requirements and create the trust necessary.

    ReplyDelete
  11. Are the eHealth standards development organisations in Australia necessary? That is a valid question, perhaps we should also ask are the in a position to contribute as intended and more importantly now government runs a solution and seems keen on dictating winners why they be able to contribute and remain respected as a process?

    I hope standards can rebuild locally, I guess the next year will tell, hopefully they can retain a perception of neutrality and not become a porn of ADHA.

    ReplyDelete
  12. Why are Standards an important investment? It is probably a sad reflection of where digital health has been led to that this is even a widespread discussion, sure it is always good to revisit and validate purposes and value but standards touch every part of our lives and contribute arguably more to the creation and sustainability of civilisation than any other factor.

    What would the landscape look like if we stopped investing in people, process, technology to do the hard and laboriously intensive work to create and maintain standards (many for little of no financial gains or recognition outside a small group)?

    1. Products might not work as expected
    2. They may be of inferior quality
    3. They may be incompatible with other products – in fact they may not even connect with them
    4. In extreme cases, non-standardised products may be dangerous
    5. Customers would be restricted to one proprietary solution or supplier
    6. Industry would be obliged to invent their own individual solutions to even the simplest needs, with limited opportunity to compete with others.

    I guess we don’t have to look far to see a real life reference for the above.

    The ADHA has a long way to go to repair the damage they and their predecessor the eHealth branch have done, it will take time to become trusted and will require a different set of investments from their current skills base.

    HL7 Aus has faulted of late perhaps the elections may see a return to a leadership driven entity rather than what we have. Someone will need to sort SA out, many standards are becoming out dated and end of life.

    Perhaps through the vacuum of intellectual prowess created by ADHA and the pursuit of a national PDF store a community will form where the challenges of electronic health capabilities can be discussed free from commercial and vested interests and manipulation. We need those conversation if we are to begin actually advancing eHealth and not continuously rebranding old projects and calling them new.

    ReplyDelete
  13. It is a sad day when people wonder if we really need standards to deal with eHealth. I guess when the day arrives that computers can fully understand text, then that day might arrive but you still have to find a way to transmit the text, with patient identification, authentication and security so you still need some sort of standards.

    The secure messaging project has exposed how much work is required to really support standards well, but there appears to be an attitude that we will just dumb down the messages to meet a promised deadline rather than a realization that we have a problem that needs attention. Considering there are at least 30 different PMS vendors out there and the project has only sort of engaged 2 of them its hardly going to have a big effect on interoperability.

    Its the realization that the low level "techo" stuff that needs to work has not really bubbled up to the higher levels of ADHA, and never did with NEHTA. Until it does we will continue to waste money on visions of castles in the air, with no foundations in sight. The "go Mobile, mHealth" is all well and good, but providers need to have the data and ways to manage the data first, or that will just be added noise.

    Non sexy, non "Announcables" is what a grown up organization would be doing. It would be nice if they just allocated 5% of the budget to progress compliance with existing standards and some basics like medication and patient history summaries suitable for provider to provider use and some clinical models & terminology for common procedures/reports.

    Instead it seems they are saying we don't need standards, they have even disrupted standards, as those nasty people involved in standards kept pointing out flaws in our plans.

    ReplyDelete
  14. Hard to disagree Andrew,

    I guess when the day arrives that computers can fully understand text,

    Hmm that would take a set of standards for that to happen wouldn’t it?

    The ADHA and it’s groupies can continue to fool themselves but they cannot fool the sciences, and reality is not far away now.

    ReplyDelete
  15. Another way to view it might be, does standards need ADHA? The simply operate the MyHR, the policy aspects of Government as well as other entities could fill the void nicely

    ReplyDelete
  16. I recently got a copy of my "health" record from my GP. It is full of standards:

    .html
    .pdf
    .tif
    .jpeg
    .jpg
    .imag

    and documents with multiple pages have multiple images.

    But not a single "medical" standard.

    And my name is embedded in all these documents which will make de-identification pretty much impossible.

    And it's nowhere near complete.

    We have a very, very long way to go. Pity we are currently heading in the wrong direction.

    ReplyDelete
  17. All that makes one pretty nervous with the Government planning - apparently - to make myHR material available to Big Pharma....what a nightmare!

    David.

    ReplyDelete
  18. So long as it’s not:

    V.21
    V.27ter
    ITU-T T.30
    ITU-T T.4
    ITU-T T.6

    I doubt the ADOHA CEO or COO even know what they are.

    ReplyDelete
  19. David, they are planning what? I don’t recall that being pointed out during my registration?

    ReplyDelete
  20. See the Blog of 8 October, 2017 regarding the 'Secondary Use Of MyHR Data Consultation' - Its all there and is also being reported in the press.

    David.

    ReplyDelete