FedMECA: Scalable Federated Learning via Memory-Efficient and Concurrent Aggregation
Zhonghao Chen (University of Florida), Duo Zhang (University of California, Merced), Xiaoyi Lu (University of Florida)
System Optimization & Efficiency
Abstract
Federated learning (FL) enables collaborative model training across distributed clients while preserving data privacy, but it faces growing scalability issues, causing FL to fail as the number of participating clients or model size increases. Existing aggregation paradigms usually overlook the memory and computational challenges arising from the tightly coupled processes of model collection and aggregation. Within such paradigms, aggregation necessitates waiting for all selected client updates, and the process is computationally demanding. To overcome these limitations, we propose \textbf{FedMECA}, a scalable, memory-efficient, and concurrency-aware aggregation framework for FL. FedMECA is designed to decouple model collection from aggregation, alleviating memory pressure on the central server by $\boldsymbol{36.57\times}$ and achieving up to $\boldsymbol{238.5\times}$ speedup in aggregation runtime compared to FedAvg-based systems, without compromising model accuracy or convergence speed. FedMECA is designed with minimal system complexity and can support clients with heterogeneity and non-IID data. Moreover, our approach is easily extensible to aggregation strategies at different synchrony, offering flexibility and adaptability across diverse FL applications. These results highlight the importance of rethinking server-side aggregation in FL and demonstrate that FedMECA enables scalable and efficient training for modern large-scale FL workloads.