Skip to main content
Registration is now open! Early-bird pricing available through May 5, 2026. Register now

All Accepted Demos

Pathfinder: Self-Improving Agent Trace Analysis via Adversarial Self-Play and Code Execution

Dhruv Atreja (Unaffiliated)

Engineering & Operations Evaluation & Benchmarking

Summary

A schema-agnostic trace analysis system that uses adversarial self-play and executable code search to debug agent failures, outperforming RAG by 35 points.

Description

LLM agents achieve remarkable results on complex tasks, yet debugging their failures remains manual and time-consuming. We introduce Pathfinder, a schema-agnostic trace analysis system that frames agent debugging as multi-hop search over structured and unstructured text. Unlike RAG approaches, Pathfinder uses executable code (SQL queries and bash pipelines) as its primary search mechanism, enabling precise filtering and aggregation that embeddings cannot express. We train Pathfinder via self-play between an Injector role that introduces realistic deficiencies into agent code and a Detector role that analyzes the resulting traces. We contribute a taxonomy of 50 agent failure types derived from real production bugs. Pathfinder achieves 87.2%±1.8% detection accuracy on injected deficiencies and 78.4%±2.3% on real-world bugs from held-out commits, outperforming RAG by 35.4 points in our five-seed evaluation. Self-play training provides an additional 18.8 points, with learned patterns transferring across agent architectures.

ACM CAIS 2026 Sponsors