Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators

Anupama Sridhar (), Alexander Johansen (Stanford University)

System Optimization & Efficiency Architectural Patterns & Composition

Spectral Koopman Attention (SKA) is a module for state-space models that eliminates the 'memory cliff'—where retrieval accuracy collapses for long sequences—while maintaining constant memory usage with no KV cache. It fits a spectral linear system to key-value history in closed form, enabling reliable long-range fact retrieval for extended agentic traces on commodity hardware.

Presentation

Talk

Paper Session 3: Systems Efficiency

Wednesday, May 27 · 3:50 PM – 4:00 PM

Bayshore Ballroom

Poster

Wednesday, May 27 · 5:15 PM – 6:45 PM

Carmel / Monterey

Abstract

Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from O(r^2) streaming state where r is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (∼3%) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve 100% retrieval accuracy on every configuration tested, including distractor gaps of 4,096 tokens with 32 KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.

ACM CAIS 2026 Sponsors