How to Steer Your Multi-Agent System: Human-LLM Collaborative Planning
Zeyu He (Penn State University), Hannah Kim (Megagon Labs), Dan Zhang (Megagon Labs), Estevam Hruschka (Megagon Labs)
Architectural Patterns & Composition Evaluation & Benchmarking
A user study and prototype showing that humans can effectively supervise multi-agent plans at the process level—inspecting, steering, and refining intermediate reasoning—rather than only verifying final outputs. The work characterizes hybrid human-AI planning patterns and identifies the effort-control-risk trade-offs that determine when process-level supervision is worth the cost.
Presentation
Talk
Paper Session 4: Agent Memory & Planning
Thursday, May 28 · 10:00 AM – 10:10 AM
Bayshore Ballroom
Poster
Thursday, May 28 · 4:30 PM – 6:00 PM
Carmel
Abstract
In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a design space for human-LLM co-planning interactions along three axes: mode (semantic vs. structural), scope (global vs. targeted), and level (low vs. high-level edits). We realize it in AMBIPOM, a prototype supporting process-level supervision through both semantic and structural interactions. Through a user study, we characterize how users navigate this space, revealing hybrid workflows and effort-control-risk trade-offs; through a controlled benchmark, we analyze how LLMs revise plans under varying scope and revision strategies. Our findings yield design insights for more transparent, controllable, and effective human-AI co-planning. We release code and data at https://github.com/megagonlabs/ambipom.