Skip to main content
Registration has reached capacity. Join the waitlist

All Accepted Papers

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

Jackson Hassell (Megagon Labs), Dan Zhang (Megagon Labs), Hannah Kim (Megagon Labs), Tom Mitchell (Megagon Labs), Estevam Hruschka (Megagon Labs)

Architectural Patterns & Composition

A memory-augmented agent framework that enables LLM agents to learn new classification functions from labeled examples at inference time, without any parameter updates. It uses LLM-generated episodic critiques of specific past mistakes and distills them into reusable semantic task-level guidance, outperforming few-shot prompting and matching fine-tuned baselines on diverse tasks.

Presentation

Talk

Paper Session 4: Agent Memory & Planning

Thursday, May 28 · 9:30 AM – 9:40 AM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

Abstract

We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often costly, inflexible, and opaque, we propose a memory-augmented framework that leverages LLM-generated critiques grounded in labeled data. Our framework uses episodic memory to store instance-level critiques—capturing specific past experiences—and semantic memory to distill these into reusable, task-level guidance. Across a diverse set of tasks and models, our best performing self-critique strategy (utilizing both memory types) yields an average improvement of 8.1 percentage points over the zero shot baseline, and 4.6pp over a RAG-based baseline that relies only on labels. However, improvements vary substantially across models and domains. To explain this variation, we introduce suggestibility - a novel metric capturing how receptive a model is to external reasoning provided in context. We use suggestibility to illuminate when and why memory augmentation succeeds or falls short. Beyond accuracy gains, we find pre-computed critiques substantially reduce inference-time computation for reasoning models, cutting thinking tokens by an average of 31.95% across all datasets by substituting for reasoning that the model would otherwise perform independently. Our findings highlight the conditions under which memory-driven, reflective learning can serve as a lightweight, interpretable, and efficient strategy for improving LLM adaptability.

ACM CAIS 2026 Sponsors