LLM Training | 木叶吟

ASYRA: Automating Graph Scheduling for Communication-Computation Overlap in Efficient Model Parallelism

Scaling large models requires complex multi-dimensional (n-D) parallelism, yet this paradigm suffers from severe communication bubbles …

Lei Zhang, Zhisheng YE

Characterization of Large Language Model Development in the Datacenter

Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to …

Qinghao Hu, Zhisheng YE, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang

Characterization of Large Language Model Development in the Datacenter

AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning

Large Language Models (LLMs) have demonstrated impressive performance across various downstream tasks. When training these models, …

Qiaoling Chen, Qinghao Hu, Zhisheng YE, Guoteng Wang, Peng Sun, Yonggang Wen, Tianwei Zhang

Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters

Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of …

Qinghao Hu, Zhisheng YE, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang Wen, Tianwei Zhang

Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters