
The United States Needs to Stress Test Critical Infrastructure for Different AI Adoption Scenarios
While AI may provide a way to help protect critical infrastructure, there is a growing need to stress test it under different conditions to prevent hallucinations and errors in output.
The U.S. Department of Homeland Security (DHS) defines critical infrastructure sectors as those that are considered essential to the safe functioning of a society, such as water and power supply. Critical infrastructure stakeholders have been early adopters of AI, and they are using it to improve decision making while dealing with large amounts of data and complex operations. Although, at the company level, AI can lead to improvements in operations, at the systemic level, the impact of AI may not necessarily be positive, and it could potentially lead to non-optimal solutions and risks.
The power grid, for example, is of particular concern, as operational failures can cascade to affect other dependent networks, such as water. For this reason, the U.S. government should ask DHS, in coordination with stakeholders, to stress test the performance of critical infrastructure according to metrics of reliability and resilience across wide scenarios of AI adoption. Testing the robustness of how complex systems might respond to significant developments is not a new idea: extreme weather stress tests on banks may provide ideas for how to implement stress tests on AI adoption within critical infrastructure sectors.
The United States should be proactive in implementing AI stress tests in critical infrastructure. Even sophisticated frontier models can hallucinate and, despite their human-like responses, do not exhibit human-like reasoning. Without appropriate data and training, models can be unprepared for real situations and provide incorrect information. Taking steps to prevent these vulnerabilities in critical infrastructure sectors is important because even narrow AI applications, such as helping identify a fault or determining when a tree grows too close to a power line, will ultimately inform actions which could have cascading consequences.
This can get more complicated as AI agents begin to act on networks or provide decision-support-related inputs, as in the case of multi-agent systems. Multi-agent systems are useful for distributed, complex systems (e.g. a smart grid), but in a world where multiple AI implementations are going to coexist with different goals or priorities, coordination is likely not guaranteed. Because agents tend to act strategically and in their own interest, it should not be surprising when they run into the same social dilemmas that humans do.
Additionally, AI agents won’t be adopted or used by humans at the same time, or in the same way. The interaction of humans with AI agents in a complex system, and the associated challenges in coordination, is an area of active research. The fact that humans have to contend with a complex, networked, and fast-moving system where other agents’ decisions are made faster than can be reasonably understood should suggest that these are the conditions where accidents will be a normal occurrence.
Critical infrastructure operators are familiar with the concept of stress testing against external variables like weather and commonly develop scenarios based on historical data to assess performance metrics. AI deployment on critical infrastructure is also one external variable, but one which is characterized by deep uncertainty about the technical and social impact, given its novelty.
DHS, in coordination with all stakeholders, should undertake efforts to develop scenarios and metrics to evaluate potential paths of interaction between AI and infrastructure. Specifically, we recommend that they follow the general risk management framework developed by the National Institute of Standards and Technology (NIST), and for DHS and stakeholders to govern, map, measure, and manage AI-related risks. Stress tests should focus on the following types of risks:
- Risks relating to AI models themselves (e.g. model drift).
- Risks relating to multi-agent systems where collections of AI interact directly (and indirectly).
- Risks relating to human-agent systems, where humans must coexist and cooperate in a complex system.
Stress tests should be designed with local environments in mind. For instance, the power grids in California, Texas, and the Midwest are all very different, and so the solutions will be different. Understanding the risks and benefits of particular AI applications is the first step toward understanding how best to establish practices and procedures to guide AI employment within that stakeholder community. Tailored stress tests that examine how system performance may improve or hinder efficiency, quality of service, or robustness can not only inform the best methods to mitigate identified risks, but also can increase trust in those AI applications by making analysis transparent. As in the banking sector, these stress tests are intended to build confidence amidst uncertainty, and the time to proactively build that trust is now.
Ismael Arciniegas Rueda is a senior economist at RAND and a professor of policy analysis at Pardee RAND Graduate School.
Daniel Tapia is a political scientist at RAND.
Image: Smile Studio AP/ Shutterstock.com.