Registration is now open! Early-bird pricing available through May 5, 2026. Register now

Dissecting and Improving Communication Performance in Multi-Node LLM Inference

Prajwal Singhania (University of Maryland), Siddharth Singh (University of Maryland), Lannie Dalton Hough (University of Maryland), Akarsh Srivastava (University of Maryland), Harshitha Menon (Lawrence Livermore National Laboratory), Charles Fredrick Jekel (Lawrence Livermore National Laboratory), Abhinav Bhatele (University of Maryland)

System Optimization & Efficiency

Abstract

As large language models (LLMs) continue to grow in size, distributed inference has become increasingly important. Model-parallel strategies must now efficiently scale not only across multiple GPUs but also across multiple nodes. In this work, we present a detailed performance study of multi-node distributed inference using LLMs on GPU-based supercomputers. We conduct experiments with several state-of-the-art inference engines alongside YALIS, a research-oriented prototype engine designed for controlled experimentation. We analyze the strong-scaling behavior of different model-parallel schemes and identify key bottlenecks. Because all-reduce operations are a common performance bottleneck, we develop NVRAR, a hierarchical all-reduce algorithm based on recursive doubling with NVSHMEM. NVRAR achieves up to 1.9$\times$-3.6$\times$ lower latency than NCCL for message sizes between 128\,KB and 2\,MB on HPE Slingshot and InfiniBand interconnects. Integrated into YALIS, NVRAR achieves up to a 1.72$\times$ reduction in end-to-end batch latency for the Llama 3.1 405B model in multi-node decode-heavy workloads using tensor parallelism.

                        Authors
                        Prajwal Singhania
University of Maryland
Siddharth Singh
University of Maryland
Lannie Dalton Hough
University of Maryland
Akarsh Srivastava
University of Maryland
Harshitha Menon
Lawrence Livermore National Laboratory
Charles Fredrick Jekel
Lawrence Livermore National Laboratory
Abhinav Bhatele
University of Maryland