Fully GPU-Orchestrated Multi-GPU Work Stealing

  • Type:Master Thesis
  • Date:16.01.2026
  • Supervisor:

    Prof. Dr. Frank Bellosa

    Peter Maucher

  • Graduand:Lennard Kittner
  • Links:PDF
  • Abstract
    Since the introduction of general-purpose GPU compute (GPGPU), GPUs have become an essential part of high-performance and scientific computing. However, efficiently utilizing the vast compute resources, especially in multi-GPU environments with generic irregular workloads, necessitates load balancing. Existing approaches typically rely on the CPU to manage work on behalf of the GPU.
    In this thesis, we propose MGWS, a novel decentralized work stealing system that allows GPU workers to operate independently of the CPU. Removing the reliance on the CPU prevents CPU-managed threads from becoming a bottleneck and reduces synchronization and communication overhead between the CPU and GPU.
    We present two inter-GPU communication schemes: the first is based on CPU host memory mapped to all GPUs, and the second leverages peer-to-peer direct memory access (DMA) for direct GPU-to-GPU communication. Our experiments demonstrate that the choice of inter-GPU communication mechanism has a substantial impact on overall performance. Due to hardware limitations, the full multi-GPU evaluation is restricted to the host memory-based communication scheme; however, preliminary tests suggest that using host memory is likely slower than peer-to-peer DMA. Even with only host memory enabled, MGWS can still outperform a static task assignment by up to 41%.
    BibTex:

    @masterthesis{kittner25MultiGPUWorkStealing,
      author = {Lennard Kittner},
      title = {Fully GPU-Orchestrated Multi-GPU Work Stealing},
      type = {Master Thesis},
      year = 2026,
      month = jan# "16",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}
    }