GPU Memory Management

This project has been completed.

Over the years, graphics processing units (GPUs) have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud’s cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels to ensure that data is present on the GPU when referenced. However, software scheduling of GPU kernels has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available.

We present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU’s ability to access system RAM directly to extend the GPU’s own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling.

Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.

Since accessing evicted data in the GPU over the PCIe bus induces noticable overhead, the swapping policy plays a central role in efficient GPU memory swapping. However, the hardware features commonly used to identify rarely-accessed pages on the CPU – such as reference bits – are not available in current GPUs. We have therefore analyzed the behavior of various GPU applications to determine their memory access patterns offline using the GPU’s performance monitoring counters. Based on our insights about these patterns, we derived a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead.

Contact: Prof. Dr.-Ing. Frank Bellosa


Author	Title	Source
Dr.-Ing. Jens Kehne	Transparent Memory Extension for Shared GPUs	Dissertation, Fakultät für Informatik, Institut für Technische Informatik (ITEC), Karlsruher Institut für Technologie (KIT)
Jens Kehne, Jonathan Metter, Martin Merkel, Marius Hillenbrand, Mathias Gottschlag, Frank Bellosa	GPrioSwap: towards a swapping policy for GPUs	SYSTOR 2017, 10th ACM International Systems & Storage Conference, Haifa, Israel, May 22 - 24, 2017
Jens Kehne, Stanislav Spassov, Marius Hillenbrand, Marc Rittinghaus, Frank Bellosa	LoGA: Low-overhead GPU accounting using events	SYSTOR 2017, 10th ACM International Systems & Storage Conference, Haifa, Israel, May 22 - 24, 2017
Jens Kehne, Jonathan Metter, and Frank Bellosa	GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping	Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'15), Istanbul, Turkey, March 14-15, 2015