Registration has reached capacity. Join the waitlist

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell (Megagon Labs), Dan Zhang (Megagon Labs), Hannah Kim (Megagon Labs), Tom Mitchell (Megagon Labs), Estevam Hruschka (Megagon Labs)

Architectural Patterns & Composition

A memory-augmented agent framework that enables LLM agents to learn new classification functions from labeled examples at inference time, without any parameter updates. It uses LLM-generated episodic critiques of specific past mistakes and distills them into reusable semantic task-level guidance, outperforming few-shot prompting and matching fine-tuned baselines on diverse tasks.

Presentation

Talk

Paper Session 4: Agent Memory & Planning

Thursday, May 28 · 9:30 AM – 9:40 AM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

View day schedule

Abstract

We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages LLM-generated critiques grounded in labeled data. Our framework uses episodic memory to store instance-level critiques—capturing specific past experiences—and semantic memory to distill these into reusable, task-level guidance. Across a diverse set of tasks and models, our best performing self-critique strategy (utilizing both memory types) yields an average improvement of 8.1 percentage points over the zero shot baseline, and 4.6pp over a RAG-based baseline that relies only on labels. However, improvements vary substantially across models and domains. To explain this variation, we introduce suggestibility - a novel metric capturing how receptive a model is to external reasoning provided in context. We use suggestibility to illuminate when and why memory augmentation succeeds or falls short. Beyond accuracy gains, we find pre-computed critiques substantially reduce inference-time computation for reasoning models, cutting thinking tokens by an average of 31.95% across all datasets by substituting for reasoning that the model would otherwise perform independently. Our findings highlight the conditions under which memory-driven, reflective learning can serve as a lightweight, interpretable, and efficient strategy for improving LLM adaptability.

Artifacts & Links

                        Authors
                        Jackson Hassell
Megagon Labs
Dan Zhang
Megagon Labs
Hannah Kim
Megagon Labs
Tom Mitchell
Megagon Labs
Estevam Hruschka
Megagon Labs