I recently gave this 5-minute lightning talk at AAMAS 2026 in beautiful Paphos, Cyprus, representing my team at Microsoft Research.
The idea behind the talk is simple: as we start building AI systems where multiple agents work together, our old safety tests just aren’t enough anymore. When someone tries to trick these systems with a bad prompt, they often fail — but it’s hard to know why. This presentation walks through the common reasons these multi-agent systems break down, and introduces a tool we built called DHARMA that helps us pinpoint exactly where the failure happened. Scroll down and the deck keeps pace with the notes.
You can also download the full deck (PDF).