An SSO certificate expired. A carrier dropped calls. Inside a few minutes, tens of thousands of customer interactions were failing across an enterprise contact center, and a war room formed to put it out. Nothing about that says “research.” No discussion guide, no recruited participants, no consent form. Just the incident and a clock.
I treated it as the best research session I'd had in months.
We're mostly trained to see incidents as interruptions, things to get through so we can get back to the real work of planned studies. I've come around to the opposite view. An outage is an honest look at how people actually figure things out under pressure, and it's honest in a way a scheduled session never manages, because nobody is performing for you. Nobody is recalling what they think they'd do. They're doing it, right now, with real money on the line, and every move shows you what they look for, what they can't find, and which questions occur to them first.
So while the room worked the problem, I worked the room. I ran it as in-situ research, mapping what had happened, what we actually knew, and what we were choosing to do, following the path the operators really took through the failure instead of the tidy one a runbook imagines.
What stuck with me was how much of the response got assembled by hand. The hard part wasn't replacing a certificate or routing around a carrier. It was that nobody could quickly see the whole picture of what was happening across the system as it moved, which failures were causes and which were just downstream noise. People were stitching that together from fragments, under the worst possible conditions.
That's not a line in a post-mortem. That's a product requirement. The gap the outage put on display is the kind an AI Copilot exists to close, so the incident didn't end in a doc that gets archived. It ended in Copilot requirements, grounded in about the strongest evidence I could ask for: how good operators behave when the system breaks and the thing they need to see isn't there.
It lined up with something I'd hit on a routing-diagnostics project, where the real problem also turned out to be the missing explanation rather than missing data. The outage didn't just echo that. It made the argument for me.
The habit I kept is simple, and it changed how I work. Stop treating the floor's worst day as something to survive. It's something to study. The best research isn't always the research you planned. Sometimes it's being in the room when things break, and having the discipline, in all the noise, to keep listening for the pattern nobody has time to say out loud.
Filed by Erin Naylor — June 9, 2026