GPU Memory Management

This project has been completed.

Over the years, graphics processing units (GPUs) have been finding their way into cloud computing platforms, allowing users to benefit from the performance of GPUs at low cost. However, a large portion of the cloud’s cost advantage traditionally stems from oversubscription: Cloud providers rent out more resources to their customers than are actually available, expecting that the customers will not use all of the promised resources. For GPU memory, this oversubscription is difficult due to the lack of support for demand paging in current GPUs. Therefore, recent approaches to enabling oversubscription of GPU memory resort to software scheduling of GPU kernels to ensure that data is present on the GPU when referenced. However, software scheduling of GPU kernels has been shown to induce significant runtime overhead in applications even if sufficient GPU memory is available. 

We present GPUswap, a novel approach to enabling oversubscription of GPU memory that does not rely on software scheduling of GPU kernels. GPUswap uses the GPU’s ability to access system RAM directly to extend the GPU’s own memory. To that end, GPUswap transparently relocates data from the GPU to system RAM in response to memory pressure. GPUswap ensures that all data is permanently accessible to the GPU and thus allows applications to submit commands to the GPU directly at any time, without the need for software scheduling.

Experiments with our prototype implementation show that GPU applications can still execute even with only 20 MB of GPU memory available. In addition, while software scheduling suffers from permanent overhead even with sufficient GPU memory available, our approach executes GPU applications with native performance.

Since accessing evicted data in the GPU over the PCIe bus induces noticable overhead, the swapping policy plays a central role in efficient GPU memory swapping. However, the hardware features commonly used to identify rarely-accessed pages on the CPU – such as reference bits – are not available in current GPUs. We have therefore analyzed the behavior of various GPU applications to determine their memory access patterns offline using the GPU’s performance monitoring counters. Based on our insights about these patterns, we derived a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead. 

Contact: Prof. Dr.-Ing. Frank Bellosa

Author Title Source

Dr.-Ing. Jens Kehne

Dissertation, Fakultät für Informatik, Institut für Technische Informatik (ITEC), Karlsruher Institut für Technologie (KIT)

Jens Kehne, Jonathan Metter, Martin Merkel, Marius Hillenbrand, Mathias Gottschlag, Frank Bellosa

SYSTOR 2017, 10th ACM International Systems & Storage Conference, Haifa, Israel, May 22 - 24, 2017

Jens Kehne, Jonathan Metter, and Frank Bellosa

Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'15), Istanbul, Turkey, March 14-15, 2015

Jens Kehne, Stanislav Spassov, Marius Hillenbrand, Marc Rittinghaus, Frank Bellosa

SYSTOR 2017, 10th ACM International Systems & Storage Conference, Haifa, Israel, May 22 - 24, 2017