Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators
Anupama Sridhar (), Alexander Johansen (Stanford University)
System Optimization & Efficiency Architectural Patterns & Composition
Spectral Koopman Attention (SKA) is a module for state-space models that eliminates the 'memory cliff'—where retrieval accuracy collapses for long sequences—while maintaining constant memory usage with no KV cache. It fits a spectral linear system to key-value history in closed form, enabling reliable long-range fact retrieval for extended agentic traces on commodity hardware.
Presentation
Talk
Paper Session 3: Systems Efficiency
Wednesday, May 27 · 3:50 PM – 4:00 PM
Bayshore Ballroom
Poster
Wednesday, May 27 · 5:15 PM – 6:45 PM
Carmel / Monterey
Abstract
Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from O(r^2) streaming state where r is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (∼3%) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve 100% retrieval accuracy on every configuration tested, including distractor gaps of 4,096 tokens with 32 KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.