ASYRA: Automating Graph Scheduling for Communication-Computation Overlap in Efficient Model Parallelism

Abstract

Scaling large models requires complex multi-dimensional (n-D) parallelism, yet this paradigm suffers from severe communication bubbles that diminish efficiency. However, existing overlapping techniques rely on complex manual kernel fusion and lack generality across diverse parallelism patterns, model architectures, or workloads. Hence, we propose ASYRA, an automatic communication-computation overlapping approach for n-D model parallelism via graph scheduling during the compiling stage. By estimating runtime makespan and memory usage via the simulator under various configurations, ASYRA applies graph-based scheduling with tiling, reordering, and bucketing for graph operators to maximize overlapping, thus achieving high efficiency. Extensive n-D parallel experiments demonstrate that ASYRA achieves up to 4% to 30% speedup in training and inference, and saves nearly 30% of the activation memory in inference.

Publication
Preprint
Zhisheng YE
Zhisheng YE
Machine Learning Systems Researcher

My research interests include algorithm–system co-design for machine learning systems and resource management, etc.

Related