Registration has reached capacity. Join the waitlist

SQLsaber: Agentic SQL Assistant for Efficient and High-Accuracy Natural Language Database Exploration

Sarthak Jariwala (Swift Solar Inc.)

Architectural Patterns & Composition Evaluation & Benchmarking Security & Privacy

An agentic SQL assistant equipped with four core tools that achieves 95.2% execution accuracy on BIRD with Claude Opus, a 47.8% improvement over prior best, in median 16 seconds.

Presentation

Demo session

Friday, May 29 · 1:45 PM – 3:15 PM

San Jose / Santa Clara

View day schedule

Description

Natural language interfaces to databases have long been a goal of human-computer interaction research, yet current Text-to-SQL approaches still struggle with complex query execution accuracy (EX). We introduce SQLsaber, a safe and extensible open-source agentic SQL assistant designed for high accuracy natural language database exploration. SQLsaber is available as both an interactive CLI and Python SDK, and is only equipped with four core tools — list-tables, introspect-schema, search-knowledge, and execute-sql — which together enable it to progressively discover schema context, retrieve domain knowledge, generate queries, and self-correct using execution feedback. Here, we demonstrate the power of tool use and agentic approach for high accuracy natural language database exploration. Furthermore, SQLsaber is designed with defense-in-depth security, including AST-based SQL guards, dialect-specific function denylists, and database-level read-only enforcement, as well as a plugin architecture for extensibility. We evaluate SQLsaber's performance on challenging questions of the BIRD dev set using three frontier LLMs. All three models surpass 88% EX, with Claude Opus 4.6 achieving 95.2% EX, a relative improvement of 47.8% over the prior best. We further show that the SQLsaber agent is efficient, with Claude Opus 4.6 reaching an answer in median time of 16 seconds with 6 tool calls (median). Finally, through cross-model agreement analysis using SQLsaber, we identify 19 instances of incorrect gold SQL in the benchmark's challenging set, highlighting the value of agentic exploration for both natural language query answering and benchmark auditing.