This appeared last week and would seem to suggest, from a real expert, no!
Yes, we weren’t prepared for the Optus outage, but we should have been
Telecommunications expert
November 8, 2023 — 3.44pmThe Optus outage cannot be considered a “rare occasion”. Over the past few years, we have witnessed several major outages across the telco networks, making it imperative for us to prepare ourselves for such events. But we have ignored the warnings.
Today, more than 99 per cent of telecoms traffic comprises data. Virtually every organisation and nearly all Australians rely on data services through their phones and fixed-line connections. As we’ve observed, an outage of this magnitude can cause significant disruptions in the economy and people’s private lives. In this case, even the triple-zero emergency service on landlines was disconnected.
These outages are of national interest, and thus, we require national solutions to mitigate the considerable fallout from such events.
What occurred at Optus was likely to have been a software problem. While such issues occur more frequently, most systems recover in seconds or minutes, resulting in minimal disruption. However, in some cases, as appears to have happened this time, a critical fault during a software update can cascade through the computer systems that underpin the network’s operation.
Unravelling, fixing and bringing all these different systems back online can take hours, and sometimes even days. Moreover, not all systems are likely to come back online simultaneously; they need to be restarted one by one, further extending the recovery time.
In the end, this is an infrastructure problem.
There are essentially two long-term solutions. The first pertains to the individual networks of the operators. It is unacceptable for there to be a single point of failure in a network that can bring down an entire country, or as seen before, the entire east coast. With more than 100 years of telecoms experience and a wealth of engineering knowledge and skills, networks can be designed to eliminate single points of failure.
In the event of a disruption, traffic should be rerouted through other network systems. In other words, there should be duplicated, unconnected systems, whereby one can take over from the other in emergencies.
The second solution involves the combined telecoms infrastructure in Australia. In case of an emergency, there should be a “gateway” facility connecting the networks, allowing them to take over traffic from one another. In the case of mobile networks, I have advocated for this for more than 20 years: this solution is called roaming.
After government pressure, an announcement was finally made last week that roaming via mobile networks is now possible in emergencies, such as bushfires or floods. It’s technically feasible, and we should explore its use in other emergency scenarios, such as the one we’ve experienced on Wednesday. So, for example, if you’re an Optus customer and the Optus network is down, your phone finds the Telstra network.
The reason for the delay in implementing this in Australia is the resistance from telco companies. They view the size of their networks as a competitive advantage and question why they should allow others to use their network.
The problem is that these networks aren’t merely commercial operations; they are vital infrastructure for our society and economy. Protecting the national interest in the face of serious network failures is paramount. Implementing such solutions requires the government’s commitment and the regulatory authority’s influence.
However, there is also a responsibility for users, both organisations and individuals, to acknowledge that such disruptions will happen, and they should assess their vulnerability. For example, if a company’s sales or financial systems shut down, or its transport systems don’t work, or its emergency operations fail, it must consider the need for its own solutions.
…..
Paul Budde is a leading telecommunications management and business consultant.
Here is the link for full article:
https://www.smh.com.au/business/consumer-affairs/yes-we-weren-t-prepared-for-the-optus-outage-but-we-should-have-been-20231108-p5eiid.html
Paul really says it all and in essence the story is that it is possible to get to 100% up time but you have to pay for it with pretty full provisioning etc. or you can have the government put in place forced sharing in the event of outages! It really is silly that an outage can persist while excess capacity exists just a switch away!!!!!
With so many problems the Government can’t fix the issue but with this issue there is a simple regulatory fix to get basically 100% up time for everybody. Government should just get on and legislate emergency sharing when needed!
How many decades will it take to actually do it do you reckon?
David.