Constant-Memory Retrieval via Koopman Operator Estimation for Mamba-3
Alexander Johansen (Stanford University), Anupama Sridhar ()
System Optimization & Efficiency Architectural Patterns & Composition
Abstract
Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a \emph{memory cliff}: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce \textbf{Spectral Koopman Attention (SKA)}, a module that augments SSM layers with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache, nor attention layers. SKA fits a spectral linear system to the key, value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from $O(r^{2})$ streaming state where $r$ is a small projection rank. On a Tool Trace benchmark requiring retrieval of high-entropy identifiers after distractor gaps of up to $8{,}192$ tokens, a bare Mamba-3 SSM fail to exceed chance accuracy (${\sim}2\text{--}8\%$), while SKA-augmented models reach near perfect retrieval accuracy. Ablations confirm that the spectral operator, not the masking strategy, drives the retrieval gain.