Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use
Francisco Javier Arceo (Red Hat AI), Varsha Prasad Narsing (Red Hat AI)
Security & Privacy Architectural Patterns & Composition
An open-source, vendor-neutral architecture (Llama Stack) for enterprise RAG and agentic systems that enforces multi-tenant isolation, policy-aware access control, and regulatory compliance at the retrieval layer. It addresses a fundamental flaw in standard RAG: relevance-based retrieval can surface one tenant's confidential data to another tenant simply because it scores highest.
Presentation
Talk
Paper Session 5: Security & Governance
Thursday, May 28 · 12:10 PM – 12:20 PM
Bayshore Ballroom
Poster
Thursday, May 28 · 4:30 PM – 6:00 PM
Carmel
Abstract
Retrieval-Augmented Generation (RAG) and agentic AI systems are increasingly prevalent in enterprise AI deployments. However, real enterprise environments introduce challenges largely absent from academic treatments and consumer-facing APIs: multiple tenants with heterogeneous data, strict access-control requirements, regulatory compliance, and cost pressures that demand shared infrastructure. A fundamental problem underlies existing RAG architectures in these settings: retrieval systems rank documents by relevance—whether through semantic similarity, keyword matching, or hybrid approaches—not by authorization, so a query from one tenant can surface another tenant's confidential data simply because it scores highest. We formalize this gap and analyze additional shortcomings—including tool-mediated disclosure, context accumulation across turns, and client-side orchestration bypass—that arise when agentic systems conflate relevance with authorization. To address these challenges, we introduce a layered isolation architecture combining policy-aware ingestion, retrieval-time gating, and shared inference, enforced through server-side agentic orchestration. This approach centralizes security-critical operations—tool execution authorization, state isolation, and policy enforcement—on the server, creating natural enforcement points for multitenant isolation while allowing client-side frameworks to retain control over agent composition and latency-sensitive operations. We validate the proposed architecture through an open-source implementation in OGX—a vendor-neutral framework that implements an OpenAI-compatible, open-source Responses API with server-side multi-turn orchestration. We evaluate it empirically and show that ABAC gating eliminates cross-tenant leakage while introducing negligible overhead.