fastWorkflow: Closing the Agentic Performance Gap Between Small and Frontier Language Models
Sanchit Satija (Radiant Logic), Aditya Bhatt (Radiant Logic), Priyanshu Jani (Radiant Logic), Dhar Rawal (Radiant Logic)
Architectural Patterns & Composition
Abstract
Large language models power increasingly capable autonomous agents, yet their deployment at scale is constrained by high inference costs, latency, and data privacy concerns. Small language models (SLMs) offer compelling operational advantages but exhibit systematic failure modes in agentic settings. Despite growing SLM deployment, these agentic failure modes remain poorly characterized and inadequately addressed. We present an empirically-grounded taxonomy categorizing SLM failures across five dimensions: natural language understanding failures, tool management failures, planning failures, agentic reasoning failures, and context management failures, and quantify their prevalence on the ๐-bench benchmark. Guided by this taxonomy, we introduce fastWorkflow, a dual-mode agentic architectural framework implementing a cascaded NLU pipeline for intent detection and structured parameter extraction with validation, hierarchical context organization that reduces effective action space, explicit task planning with dependency-aware decomposition, and adaptive context management, among other targeted mitigations. On ๐-bench, GPT-OSS-20B augmented with fastWorkflow achieves 83.47% Pass^1 on the Retail domain and 78.00% on Airline, surpassing all frontier models evaluated on ๐- bench including Claude Opus 4.1 (82.40% Retail, 56.00% Airline) and Claude Sonnet 4 (80.50% Retail, 60.00% Airline). Even Mistral-7B-Instruct with fastWorkflow matches Claude Sonnet 4 on Airline at 60.00% while operating at 15โ75ร lower inference cost. Ablation studies confirm that the cascaded NLU pipeline is the most impactful component, with its removal causing a 58-point performance collapse. Our findings demonstrate that architectural separation of concerns, offloading error-prone operations to structured subsystems while preserving LLM flexibility for planning and recovery can close the performance gap between small and frontier models in agentic tasks, shifting the cost-performance Pareto frontier for production agent deployment.