木叶吟
木叶吟
Home
Experience
Publications
Posts
CV
Light
Dark
Automatic
English
中文 (简体)
CUDA
GPU Pause, Resume, and Migration: The Missing Primitive in Cluster Scheduling
A technical note on GPU checkpoint/restore for schedulers, using FlowGPU as the main reference and my cudaw prototype as the first version of the codebase.
Zhisheng YE
May 17, 2026
8 min read
GPU 任务的暂停、恢复与迁移:调度器一直缺的那块拼图
一篇关于 GPU checkpoint/restore 的技术笔记:以 FlowGPU 为主线,介绍 cudaw prototype 如何探索透明的暂停、恢复与迁移。
Zhisheng YE
May 17, 2026
Cite
×