Fully GPU-Orchestrated Multi-GPU Work Stealing
- Type:Master Thesis
- Date:16.01.2026
- Supervisor:
Prof. Dr. Frank Bellosa
Peter Maucher
- Graduand:Lennard Kittner
- Links:PDF
-
Abstract
Since the introduction of general-purpose GPU compute (GPGPU), GPUs have become an essential part of high-performance and scientific computing. However, efficiently utilizing the vast compute resources, especially in multi-GPU environments with generic irregular workloads, necessitates load balancing. Existing approaches typically rely on the CPU to manage work on behalf of the GPU.
In this thesis, we propose MGWS, a novel decentralized work stealing system that allows GPU workers to operate independently of the CPU. Removing the reliance on the CPU prevents CPU-managed threads from becoming a bottleneck and reduces synchronization and communication overhead between the CPU and GPU.
We present two inter-GPU communication schemes: the first is based on CPU host memory mapped to all GPUs, and the second leverages peer-to-peer direct memory access (DMA) for direct GPU-to-GPU communication. Our experiments demonstrate that the choice of inter-GPU communication mechanism has a substantial impact on overall performance. Due to hardware limitations, the full multi-GPU evaluation is restricted to the host memory-based communication scheme; however, preliminary tests suggest that using host memory is likely slower than peer-to-peer DMA. Even with only host memory enabled, MGWS can still outperform a static task assignment by up to 41%.
BibTex:@masterthesis{kittner25MultiGPUWorkStealing,
author = {Lennard Kittner},
title = {Fully GPU-Orchestrated Multi-GPU Work Stealing},
type = {Master Thesis},
year = 2026,
month = jan# "16",
school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}
}