Publications

(2024). Characterization of Large Language Model Development in the Datacenter. In NSDI.

Preprint Cite

(2023). Deep Learning Workload Scheduling in GPU Datacenters: A Survey. In CSUR.

Preprint PDF Cite Project DOI

(2023). Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters. In OSDI.

PDF Cite Code Slides Video

(2022). Tear Up the Bubble Boom: Lessons Learned From a Deep Learning Research and Development Cluster. In ICCD.

PDF Cite Dataset DOI

(2021). ASTRAEA: A Fair Deep Learning Scheduler for Multi-tenant GPU Clusters. In TPDS.

Preprint Cite Code DOI