Registration is now open! Early-bird pricing available through May 5, 2026. Register now

Does Safety Molt? Evaluating LLM Safety in Multi-Agent Social Environments

Aman Priyanshu (Foundation-AI), Supriti Vijay (Foundation-AI), Esha Pahwa (Corvic AI)

Security & Privacy Evaluation & Benchmarking

Abstract

LLM safety evaluations predominantly test models in isolation, yet deployed AI agents increasingly operate within persistent social environments alongside other agents. We introduce a Moltbook-style simulation platform where thousands of LLM agents interact across communities over a simulated month, and use it to evaluate privacy as a downstream safety concern under varying degrees of social pressure. We find that shifting from single-turn to multi-turn social evaluation amplifies privacy violations (CIMemories---19.95\% to Ours---45.30\% across OpenAI models), that leakage is socially contagious---agents are $8\times$ more likely to disclose sensitive information after observing a peer do so---and that explicit privacy instructions reduce but do not eliminate this effect, leaving leakage rates above 37.8\% even with safeguards. Our findings suggest that static chat-based safety benchmarks systematically underestimate risks in agentic deployment, and that social context alone is sufficient to elicit sensitive disclosures that single-turn evaluations would never surface.