Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

fastWorkflow: Closing the Performance Gap Between Small and Frontier Language Models for Conversational Agents

Sanchit Satija (Radiant Logic), Aditya Bhatt (Radiant Logic), Priyanshu Jani (Radiant Logic), Dhar Rawal (Radiant Logic)

Architectural Patterns & Composition

fastWorkflow is a dual-mode agentic framework that closes the performance gap between small and frontier language models by addressing a five-dimensional taxonomy of SLM failure modes: NLU, tool management, planning, agentic reasoning, and context management. It enables smaller, lower-cost, privacy-preserving models to reach near-frontier task success rates on agentic benchmarks.

Presentation

Talk

Paper Session 1: Agent Design

Wednesday, May 27 · 11:55 AM – 12:05 PM

Bayshore Ballroom

Poster

Wednesday, May 27 · 5:15 PM – 6:45 PM

Carmel / Monterey

Abstract

Large language models are increasingly deployed in conversational agents that assist humans with complex, multi-step tasks, yet their deployment at scale is constrained by high inference costs, latency, and data privacy concerns. Small language models (SLMs) offer compelling operational advantages but exhibit systematic failure modes in agentic settings, particularly in conversational workflows: domains where tasks are solved interactively by a human and an LLM through structured tool invocation. Despite growing SLM deployment, these agentic failure modes remain poorly characterized and inadequately addressed. We present an empirically-grounded taxonomy categorizing SLM failures across five dimensions: natural language understanding failures, tool management failures, task decomposition and sequencing failures, agentic reasoning failures, and context management failures, and quantify their prevalence on the 𝜏-bench benchmark. Guided by this taxonomy, we introduce fastWorkflow, a dual-mode agentic architectural framework implementing a cascaded NLU pipeline for intent detection and structured parameter extraction with validation, hierarchical context organization that reduces effective action space, explicit task planning with dependency-aware decomposition, and adaptive context management, among other targeted mitigations. On 𝜏-bench, GPT-OSS-20B augmented with fastWorkflow achieves 83.47% Pass^1 on the Retail domain and 78% on Airline, surpassing all frontier models evaluated on 𝜏-bench leaderboard including Claude Sonnet 4 (80.5% Retail, 60.0% Airline) and Claude Opus 4.1 (82.4% Retail, 56.0% Airline), while operating at ∼22× lower inference cost. Even Mistral-7B-Instruct with fastWorkflow matches Claude Sonnet 4 on Airline at 60%. Ablation studies confirm that the cascaded NLU pipeline is the most impactful component, with its removal causing performance collapses of 58 points on Retail and 68 points on Airline. Our findings demonstrate that architectural separation of concerns, offloading error-prone operations to structured subsystems while preserving LLM flexibility for planning and recovery, can close the performance gap between small and frontier models in conversational workflow tasks, shifting the cost-performance Pareto frontier for production deployment in domains involving multi-turn, tool-augmented human-LLM collaboration.

ACM CAIS 2026 Sponsors