Registration has reached capacity. Join the waitlist

Scaling Textual Gradients via Sampling-Based Momentum

Zixin Ding (University of Chicago), Junyuan Hong (University of Texas at Austin), Zhan Shi (Santa Clara University), Tianhao Wang (Princeton University), Zinan Lin (Microsoft Research), Li Yin (SylphAI), Meng Liu (SylphAI), Zhangyang Wang (University of Texas at Austin), Yuxin Chen (University of Chicago)

System Optimization & Efficiency

A method for scaling prompt optimization with LLM-generated textual gradients that introduces sampling-based momentum to overcome context-length limits and instability at large training set sizes. It shows that principled scaling of textual gradient descent—analogous to SGD with momentum—yields consistent gains that naive scaling cannot achieve.

Presentation

Talk

Paper Session 6: Learning & Control

Thursday, May 28 · 3:30 PM – 3:40 PM

Bayshore Ballroom

Poster

Thursday, May 28 · 4:30 PM – 6:00 PM

Carmel

View day schedule

Abstract

LLM-based prompt optimization, which uses LLM-provided "textual gradients" (feedback) to refine prompts, has emerged as an effective method for automatic prompt engineering. However, its scalability and stability are unclear when using more data in training. We systematically investigate the potential and challenges of scaling training data in textual gradient descent. We show that naively scaling training examples is infeasible due to both explicit context-length limits and an implicit context wall, where long-context degradation yields diminishing returns. Inspired by prior wisdom in stochastic gradient descent, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M), which reweights updates through momentum sampling, using bootstrapped minibatch validation accuracy as importance weights over historical prompts. To stabilize TSGD and enable effective scaling within a limited context window, TSGD-M carries prior prompts information by dynamically exploring the past top performing prompts without expanding input context length. TSGD-M integrates seamlessly into existing prompt optimization frameworks, including TextGrad, DSPy-COPRO, and AdalFlow, and achieves consistent gains across 6 benchmarks.

Artifacts & Links

                        Authors
                        Zixin Ding
University of Chicago
Junyuan Hong
University of Texas at Austin
Zhan Shi
Santa Clara University
Tianhao Wang
Princeton University
Zinan Lin
Microsoft Research
Li Yin
SylphAI
Meng Liu
SylphAI
Zhangyang Wang
University of Texas at Austin
Yuxin Chen
University of Chicago