optimize_anything: A Universal API for Optimizing any Text Parameter
Lakshya A Agrawal (University of California, Berkeley), Donghyun Lee (University of California, Berkeley), Wenjie Ma (University of California, Berkeley), Karim Elmaaroufi (University of California, Berkeley), Shangyin Tan (University of California, Berkeley), Sanjit A. Seshia (University of California, Berkeley), Koushik Sen (University of California, Berkeley), Dan Klein (University of California, Berkeley), Ion Stoica (University of California, Berkeley), Joseph Gonzalez (University of California, Berkeley), Omar Khattab (Massachusetts Institute of Technology), Alexandros G. Dimakis (University of California, Berkeley), Matei Zaharia (University of California, Berkeley)
System Optimization & Efficiency Architectural Patterns & Composition
Summary
A declarative API that treats code, prompts, and agent architectures as optimizable text artifacts, achieving results like 47% faster Claude Code resolution and 89.5% ARC-AGI accuracy.
Description
We present optimize_anything, a declarative API that formulates a broad class of optimization problems as iterative refinement of text artifacts. The key observation is that code, prompts, agent architectures, and algorithmic policies are all serializable as strings whose quality can be measured through automated evaluations, enabling a unified LLM-driven search over these heterogeneous domains. Our API subsumes three optimization paradigms under a single interface: single-task search, multi-task search with cross-problem transfer, and generalization to unseen inputs. Two mechanisms underpin the approach: (1) Actionable Side Information (ASI), a first-class evaluator contract that surfaces rich, multimodal diagnostics (compiler errors, profiler traces, rendered images) to the LLM proposer, replacing blind mutation with targeted, diagnostic-driven revision; and (2) Pareto-efficient search over per-task and per-metric scores, which preserves complementary candidate strengths that naive score averaging discards. We evaluate across eight domains. On coding agent skill optimization, learned skills push Claude Code to near-perfect task completion while reducing resolution time by 47%. On cloud scheduling, discovered algorithms cut egress costs by 40.2%, topping the ADRS leaderboard. On ARC-AGI, full agent architecture evolution improves Gemini Flash accuracy from 32.5% to 89.5%. On AIME 2025, prompt-only optimization raises GPT-4.1-mini accuracy from 46.67% to 60.00%. On KernelBench, multi-task CUDA kernel generation produces kernels where 87% match or exceed baseline performance. On circle packing (n=26), we surpass AlphaEvolve's published result. On the 56-problem EvalSet benchmark, LLM-generated solver code matches Optuna, a mature numerical optimizer. These results suggest that unifying text optimization under a minimal, diagnostic-rich API can match or exceed domain-specific tools across a wide range of problem classes.