Reliability and Risk
The Challenge of Managing Interconnected Infrastructures
Emery Roe and Paul R. Schulman


Contents and Abstracts
1 The Infrastructure Society
chapter abstract

This chapter introduces the book's argument about the central role critical infrastructures for providing clean water, communications, transportation, electricity, flood protection, financial services, and major emergency response, among other contemporary essentials, have in society and for individual well-being. The capabilities of humans to manage these complex and increasingly connected infrastructure systems are being stretched to their limits. Yet society insists these systems must have permanent continued and predictable operation at high levels of dependability and safety. This chapter reviews the literature on the pivotal role of infrastructures in present-day societies, discusses the major properties of these infrastructures, looks at the overarching priority that they be managed safely and continuously, and considers the risks from infrastructures being more and more interconnected and seemingly more and more vulnerable to large cross-system failure.

2 The Interinfrastructure Challenge
chapter abstract

A great deal of policy and debate over interconnected infrastructures focuses on a subset of the major patterns and ways two or more infrastructures are interconnected. Cascading interinfrastructure failure, in which the failure of one triggers failure in another that is spatially near or otherwise functionally dependent on it, has received considered attention. However, at least five other types of interconnectivity in ICISs are of policy and management relevance, and these are introduced and implications drawn for subsequent chapters. The six ICISs are illustrated from our case study and through the secondary literature.

3 High Reliability in Critical Infrastructures
chapter abstract

The main concepts and features of high-reliability management in critical infrastructures are described. The chapter gives special attention to the precluded-events standard for reliability (i.e., certain events must never happen); the centrality of control rooms and their properties for high-reliability management in real time; the key role that reliability professionals have in this management, especially when confronting the inevitable surprise of unexpected events in a system of poor design, technology, and regulation; and the reliability-relevant stages of infrastructure operations—normal, disrupted, restored, failed, recovered, and new normal—with special attention to and a case study of the intensive interorganizational requirements recovery of major infrastructures that have failed.

4 A Framework for ICIS Reliability Management
chapter abstract

This chapter, and the next two, sets out the framework of the book, focusing on reliability management of ICISs. It begins by demonstrating how the dominant conceptualizations of interconnectivity fall short in their almost-exclusive attention on design and technology solutions to interinfrastructure failure and lack of attention to the management dimension necessary for real-time reliability. The chapter lays out the building blocks of a framework for reliable (and safe) operations at the ICIS level: the design-management continuum for reliability management, simple models and definitions of systems of one or more infrastructures, the pivotal concept of control variables shared by infrastructures, types of system resilience and their definition within an ICIS, four basic types of interconnectivity configurations and their shift points, the specific dimensions of interconnectivity, and management of latent interconnectivity. Examples are drawn throughout from the case study.

5 A Framework for ICIS Risk Management
chapter abstract

In the book's framework, risks follow from reliability: the standard of reliability chosen, the special skills of reliability professionals to manage the way they do (including managing the risks that come from managing reliably), and the special features of the control rooms these professionals work in and with. This chapter focuses on the risk side of reliability management and elaborates on the previous chapter's framework building blocks for reliability management at the ICIS level: unpredictabilities that must be managed for reliability purposes (risk, uncertainty, ambiguity, and unstudied conditions), the control room's comfort zone for these unpredictabilities, manifest versus latent risks and the implications of their difference for managing across four performance modes in the control room, and the importance of these building blocks and their implications for ICISs, including increased calls for coordination, innovation, and efficiency. Examples from the case study are used throughout.

6 Our Framework in a Comparative Analytic Perspective
chapter abstract

Current models of ICISs often assume a cat's-cradle of interconnectivity, in which everything is connected to everything else. This chapter calls that assumption into question from the perspective of both the framework presented and the empirical evidence from managing reliability and risk at the ICIS level. A review of the literature shows that infrastructures are typically managed so as to prevent interinfrastructural cascades, and there are far fewer cascades than current models would lead us to expect. The case study also supports that finding, and the chapter gives many examples of both positive and negative instances of interinfrastructural connectivity. One important implication from both the primary and the secondary research is that there is a fundamental difference between system-failure and system-normal operations in ICISs in terms of time and scale.

7 The Full Cycle of Infrastructure Operations
chapter abstract

This chapter expands the discussion of how time and scale interact with risk when managing infrastructures for reliability. The whole cycle of infrastructure operations ranges from normal to disrupted, restored, failed, recovered, and new normal. Risks vary by the stage of the cycle, and each stage is managed for reliability differently. Thus, a disruption in one infrastructure of an ICIS requires not only zooming down to determine root causes but also zooming up to determine its impact on the entire infrastructure as a system and zooming across to determine how these impacts affect infrastructures interconnected with it. Two examples—the 2010 San Bruno gas explosion and the major nexus of infrastructure on an island in the Delta—illustrate how risk analysis is to be undertaken in the ICIS setting.

8 Managing Interconnected Control Variables: A Case of Electricity and Water
chapter abstract

This chapter presents a detailed case study of control variables shared across critical infrastructures, the framework's core concept, and what this implies for reliability and risk management at the ICIS level. The input-output interconnectivity between water flows at the Banks pumps of the State Water Project near Tracy and electricity flows from the transmission grid to power those pumps is examined. Using a unique multiyear dataset and statistical analysis, the chapter shows how changes in electricity flows to the Banks pumps, an extremely important element in the State Water Project, affect changes in water flows through those pumps and what the implications are for resilience in each infrastructure's operations.

9 Interinfrastructural Innovation and Its Control Room Impacts: A Case Study of CAISO and MRTU
chapter abstract

Control rooms occupy center stage in the book's framework for ICIS reliability and risk because of their unique organizational niche and special features and capacities when it comes to ensuring high reliability in critical services. This chapter presents a case study of the Market Redesign and Technology Upgrade (MRTU), a software innovation. This technological change affected real-time reliability in the control room of California's electricity transmission manager, the California Independent System Operator (CAISO). The chapter brings together the concepts of performance modes, comfort zones, the whole cycle of operations, and precursor resilience.

10 Interconnected Infrastructure Systems as a Complex Policy Problem
chapter abstract

The book's findings and framework call for a rethinking of critical infrastructures as a policy problem in two major respects. First, this chapter reconsiders the many criticisms of leadership and regulation with respect to infrastructure performance. Second, the chapter asks, regarding another policy issue that the framework itself highlights, is there an ICIS—a system of infrastructure systems—to lead, evaluate, and regulate?

11 Toward Multiple Reliability Standards for Interconnected Infrastructure Systems
chapter abstract

The book's framework and analysis demonstrate that the high reliability of individual infrastructures is increasingly at its limits in an interconnected setting. Loss of service in one infrastructure cannot be precluded by another infrastructure that depends on it, which means that the standards for reliability within an ICIS setting are no longer those of only high reliability and its precluded-events standard. Other standards of reliability are also at work—where the focus is on avoided, inevitable, or compensable events—and this chapter discusses them. The types of risks to manage arise from the standards of reliability being followed (deciding the trade-offs among risks does not necessarily lead to reliability). Policy makers, legislators, regulators, and the public must better understand the implications of the real choices being made.