木叶吟
木叶吟
Home
Experience
Publications
Posts
CV
Light
Dark
Automatic
LLM Training
ASYRA: Automating Graph Scheduling for Communication-Computation Overlap in Efficient Model Parallelism
Scaling large models requires complex multi-dimensional (n-D) parallelism, yet this paradigm suffers from severe communication bubbles …
Lei Zhang
,
Zhisheng YE
PDF
Cite
Characterization of Large Language Model Development in the Datacenter
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to …
Qinghao Hu
,
Zhisheng YE
,
Zerui Wang
,
Guoteng Wang
,
Meng Zhang
,
Qiaoling Chen
,
Peng Sun
,
Dahua Lin
,
Xiaolin Wang
,
Yingwei Luo
,
Yonggang Wen
,
Tianwei Zhang
Preprint
Cite
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Large Language Models (LLMs) have demonstrated impressive performance across various downstream tasks. When training these models, …
Qiaoling Chen
,
Qinghao Hu
,
Zhisheng YE
,
Guoteng Wang
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
Preprint
Cite
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters
Hyperparameter tuning is an essential step in deep learning model development that provides better model performance at the cost of …
Qinghao Hu
,
Zhisheng YE
,
Meng Zhang
,
Qiaoling Chen
,
Peng Sun
,
Yonggang Wen
,
Tianwei Zhang
PDF
Cite
Code
Slides
Video
Cite
×