Zhisheng YE

Zhisheng YE

Machine Learning Systems Researcher

Bytedance

Peking University

Biography

Hi there! This is Zhisheng Ye. I am currently a machine learning systems researcher on Applied Machine Learning at ByteDance, where I work on building efficient and practical systems for emerging recommendation models and LLM workloads.

I received my Ph.D. from the TELOS Systems Lab, NEEC, at Peking University in 2024, under the joint supervision of Prof. Yingwei Luo (director of NEEC) and Prof. Xiaolin Wang. Previously, I received a B.S. degree in Computer Science and Technology from EECS at Peking University, China, in 2019.

My research focuses on building efficient systems for ML and emerging LLM workloads across the entire stack, spanning training frameworks, resource scheduling, and GPU/HPC optimization. I am also a former member of PKUSC. I have also received mentorship from Prof. Tianwei Zhang of NTU and Peng Sun, and have collaborated closely with Prof. Zhang’s students, including Wei Gao, Qinghao Hu, Meng Zhang, and Qiaoling Chen.

Download my CV.

Interests
  • AI Infrastructure for LLMs
  • Machine Learning Systems
  • Resource Management
Education
  • Ph.D. in Computer Architecture, 2024

    Peking University

  • BSc in Computer Science and Technology, 2019

    Peking University

Experience

 
 
 
 
 
Bytedance
Machine Learning Systems Researcher
Jul 2024 – Present Beijing, China
 
 
 
 
 
Shanghai AI Laboratory
Research Intern
Jul 2022 – Jan 2024 Beijing, China
  • Large scale model (e.g., LLM, MoE) training infrastructure optimization.
  • Deeply involved in the development of InternLM.
 
 
 
 
 
Sensetime Research
Research Intern
Sep 2019 – Jun 2022 Beijing, China
  • Supercomputing cluster scheduling and optimization for deep learning training workloads in Sensetime Research (now SenseCore).
  • Design and implementation of a fair scheduler for DLT jobs as first author.
 
 
 
 
 
Peng Cheng Laboratory
Research Intern
Jul 2018 – Sep 2021 Shenzhen, China
  • Contributed to development of OpenI-Octopus, an open-sourced scheduler for deep learning training workloads based on Kubernetes.
  • Safe GPU sharing and efficient migration mechanisms on Kubernetes.
  • Monitoring and logging systems.
 
 
 
 
 
Peking University Cluster Competition Team
Team member
Sep 2018 – Jun 2019 Beijing, China
  • Participated in analyzing, compiling, profiling, optimizing, and improving parallelizability of general HPC tasks.
  • First Price (Team), ASC19 Student Supercomputer Challenge

Recent Publications

(2026). CONCUR: High-Throughput Agentic Batch Inference of LLM via Congestion-Based Concurrency Control. In ICML.

Preprint PDF Cite

(2026). FlowGPU: Transparent and Efficient GPU Checkpointing and Restore. In Euro-Par.

PDF Cite

(2026). Latency-SLO-Aware Memory Offloading for Large Language Model Inference. ICS.

Preprint Cite

(2026). ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism. In HPDC.

Cite DOI

(2025). LEMUR: Large Scale End-to-End Multimodal Recommendation. arXiv.

Preprint Cite