CAMI: Practical Cost-Aware Agent-Guided Multi-Indexing for Semantic Retrieval
Adnan Qidwai (IBM Software Innovation Lab), Anand Eswaran (IBM Software Innovation Lab), Sonam Mishra (IBM Software Innovation Lab), Jaydeep Sen (IBM Software Innovation Lab), Sachindra Joshi (IBM Software Innovation Lab)
System Optimization & Efficiency Architectural Patterns & Composition
CAMI is a cost-aware retrieval system that uses an agent to intelligently select which semantic enrichments—hypothetical queries, summaries, paraphrases—to generate per document chunk at index time, optimizing retrieval quality within a practical cost budget. It avoids the combinatorial explosion of exhaustively generating all enrichment types across large corpora.
Presentation
Talk
Paper Session 3: Systems Efficiency
Wednesday, May 27 · 4:10 PM – 4:20 PM
Bayshore Ballroom
Poster
Wednesday, May 27 · 5:15 PM – 6:45 PM
Carmel / Monterey
Abstract
RAG ingestion pipelines frequently augment search corpus index with semantic enrichment indices (e.g., synthetic queries or summaries generated from corpus chunks) that are subsequently queried alongside the base index to improve retrieval via better alignment between document representations and user intent. While these supplementary representations substantially improve retrieval quality, they introduce a computational bottleneck: the configuration space of enrichment types and generator models is combinatorial, and the cost of exhaustive index-time evaluation scales linearly with corpus size. We introduce CAMI (Cost-Aware Multi-Indexing), a framework that formalizes multi-index construction as a budgeted, multi-objective portfolio selection problem. CAMI targets the upstream decision of which enrichment views to generate and materialize before the retrieval backend is applied. CAMI incorporates three primary mechanisms: (i) an agentic discovery phase that proposes corpus-specific representation templates; (ii) an atomic-unit search procedure that evaluates individual enrichment-model pairs and recombines them via fidelity-local closure to identify synergistic portfolios; and (iii) a confidence-aware promotion schedule that prunes unpromising configurations early, decoupling optimization spend from total corpus size. We evaluate CAMI across diverse retrieval corpora. Our findings reveal that the framework systematically isolates high-recall portfolios under strict budget constraints, outperforming standard content-only baselines in challenging settings by up to 9.4% recall@10. Further, CAMI is able to systematically identify these high-recall portfolios using up to 5x less budget compared to random search baselines, making our approach practical in real production scenarios.