Gregory (Scotland Yard detective): “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the dog in the night-time.”
Gregory: “The dog did nothing in the night-time.”
Holmes: “That was the curious incident.”
Arthur Conan Doyle “The Silver Blaze” from “The Memoirs of Sherlock Holmes”
I work in the area of reliability and security. It is not nearly as much of a dismal science as Economics is supposed to be. But there are times when I feel we do focus on the failures to the exclusion of the success stories. Failures happen due to naturally occurring bugs or successful security attacks. And we learn from them, and should continue to learn from them, about how to design better and more secure computing systems. So doubtless they have enormous value.
But what about learning from the success stories as well. The systems that stood tall even as software and hardware glitched all around it, even as its users threw new requests at it, and even as attacks accelerated and grew more sophisticated. Can we learn principles from these that we can emulate as we design new systems? I believe the answer is yes though we may have to look under the carpet. Because systems that continue to work securely and reliably are the analog to the dog that did not bark. And we have to bring out the Sherlock in us to infer principles for resilient system design from them.
I will look at principles we can distill from two remarkably resilient Cyber-Physical Systems (CPS). The first is the power grid here in the US and in far away Ukraine. The second is the tsunami detection system in the Indian Ocean and some other sea-facing geographies. I will cover the first case study in this article and follow that up with a quick second article.
The US Power Grid
The power grid in the US is a remarkably diverse and complex system. It is perhaps better described as a “system of systems”. There are many stakeholders, such as, the power generators, utilities, regulators, transmission operators and so on. It is quite awe-inspiring if one stops to think what all need to be in place so that the turning on the switch reliably turns on the light bulb. And it rarely goes down, in a large-scale systemic outage sense of the term. There have been many thoughtful analyses done on the resilience of the power grid and of further challenges to secure them. The National Academies report from 2021 titled “The Future of Electric Power in the United States” is a particularly comprehensive and forward-looking one. A few prime patterns for resilience come to the fore by studying these analyses.
Principles of Resilience
First it strikes that this is a rare example of a public-private partnership where incentives are well aligned. Some parts of the U.S. wholesale electricity market are traditionally regulated (gray areas), meaning that utilities are responsible for the end-to-end workflow culminating in electricity to consumers. They own the generation, transmission, and distribution systems to the end consumers. Other parts of the wholesale market (the Northeast, Midwest, Texas, and California) are “restructured competitive markets“. These markets are run by independent system operators (ISOs), which allow competitive market mechanisms that allow independent power producers and non-utility generators to trade power.
Information and communication technologies (ICT) have been embraced by the stakeholders in the US electric grid, to varying extents. In cybersecurity, a traditional bugaboo has been the difficulty of sharing information among the concerned parties. This sector seems to have gotten it right. Electric grid owners and operators have agreements that facilitate sharing information about threats and defenses before, during, and after incursions occur, and many peer organizations have agreements for mutual aid in
the event of an attack. The North American Electric Reliability Corporation (NERC) encompasses six Regional Entities as shown in the map below. NERC has a dedicated unit for information sharing and analysis, called prosaically Electricity Information Sharing and Analysis Center (E-ISAC). As an example, it organized the Grid Security Exercise (GridEx), a grid security and emergency response exercise in 2019. It was two days of simulated physical and cyber attacks to train folks how to respond to events that affected the reliable operation of the grid. And then, for the pointy-haired bosses … errr … executives, they arranged a hands-on exercise, no small feat.

Off to Ukraine and its Power Grid
Now let’s move our sight to a place far far away … Ukraine and its electricity grid. Since the war with Russia started in 2022, its grid has weathered the relentless attacks relatively well. Relatively well meaning large parts of the country have not been plunged into darkness for hours in a day. What’s behind this feat? For one, they learned their lesson well. There was a sophisticated and successful malware attack against the Ukrainian power grid by suspected Russian hackers in December 2015. And it plunged the western part of the country into darkness for as long as six hours. And as if to prove that they had not learned their lesson, another cyber attack crept through the chinks in 2016, this time bringing Kiev’s power grid down. Then they took the lessons to heart and made their cyber defenses much stronger. And the evidence is no rampant malware has run amuck, and I am sure that is not due to lack of effort.
The other key ingredient has been the decision (and the wherewithal) to operate the grid in an island mode. As NREL (US government’s National Renewable Energy Laboratory) puts it:
Electric grids are delicately interconnected systems in which the supply of available energy and the use of that energy must be maintained in constant balance. Synchronizing one grid to another requires a precise match of the frequency, phase, and voltage of electric current. Failure to do so could result in grid collapse (a blackout) of both power systems and possibly require weeks of repair to make them functional.
”Ukraine Fights To Build More Resilient, Renewable Energy System in Midst of War” NREL, July 27, 2023.

Ukraine learned to operate its grid in an island mode, which as the name suggests, means disconnecting from the other grids, and most importantly from the Russian grid. This is the modern equivalent of raising the drawbridge over the moats. Then when things became stable, relatively speaking, they synchronized their grid to the European grid. An important principle to be learned for any CPS —- to disconnect parts of a networked system and reconnect, efficiently, and while maintaining some desired properties all the way through.
Third, US has been helping Ukraine for a while, to build up its renewable energy infrastructure. Renewable energy sources, as is well advertised, are everywhere. And so unlike say diesel generators, we do not need to transport fuel to a particular location to keep the lights on. And thus, almost by definition, this makes for a more resilience infrastructure. This has been done through the wonderful USAID program of ours. USAID’s Energy Security Project (ESP) in Ukraine coupled with its Energy Sector Transparency (EST) activity is a great example of taking a technology solution (the former) together with a policy angle (the latter) to make measurable progress.
To Sum
Here we have seen the case of a resilient Cyber-Physical System (CPS) in two geographies, the electric power grid, in our home and in far-off and battle-torn Ukraine. There are some key principles that we can distill for designing, implementing, and operating resilient CPS in general. On the technology side, the ability to disconnect and reconnect different elements of the network quickly is a prime one. On the operational side, the importance of learning lessons from failures and having “dress rehearsals” for failures are borne out. On the policy side, it is clear that public and private sector engaging together and going in together with the technology piece is necessary. We should be cautious not to overextend the principles from the power grid to any other CPS. This is because the power grid is regulated more so than many other CPS and its application scenarios, while large, can still be bounded within a reasonable-sized envelope. There are many other CPS that are far more wild wild west where these principles have to be modified before we can apply them.