DRAM Mapping Aliases

When an OS allocates memory to a process, it implicitly performs long-term scheduling on DRAM resources such as channels and banks: Each mapped page frame allows memory operations to send requests to the channels and DRAM banks which are backing that page frame. The OS should be able to choose between sharing or dedicating resources dynamically – yet it cannot do that on conventional systems.

We observed slowdowns from DRAM interference of up to 36% on our 4-core prototype platform for some combinations of workloads, caused by the uncontrolled sharing of DRAM channels in the typical configuration of channel interleaving. Previous work proposed channel partitioning to mitigate that interference, but thereby reduces maximum throughput for individual applications even when workloads do not interfere.

With our approach, we enable the OS to choose between channel interleaving and partitioning at run-time, at the granularity of address space (AS) segments. For that purpose, we map DRAM into the physical AS multiple times, as one dedicated region per channel for partitioning and then as another region that interleaves all channels. We implement this approach on commodity hardware. We change the OS’s memory management so that we can dedicate channels to processes or share channels between processes with interleaving by choosing page frames from the appropriate region. As a result, we can switch to the configuration that achieves optimum execution speed and system throughput at application run-time (e.g., when workloads change), whereas a conventional system would have to choose interleaving or partitioning while booting.

For example, with DRAM mapping aliases on a 2-channel system, we map DRAM (1) interleaving banks and channels, (2) interleaving banks but partitioning channels, or (3) partitioning banks and channels (from top to bottom), and thereby enable to choose between these mappings at run-time.

Dynamic assignment to a partitioned mapping alias reduces the slowdown from DRAM interference for streamcluster from 36.3% to 8.3%, which is almost as performant as if DRAM had been statically partitioned (6.5%). We observe similar results for x264. NPB sp.C, mg.C and SPEC libquantum are constrained by a single channel's bandwidth and prefer interleaving.  They achieve nearly the same performance with dynamic reconfguration from partitioning to interleaving as in a conventional system with interleaving (differences below 1.8%).

Contact: Prof. Dr.-Ing. Frank Bellosa, Marius Hillenbrand

Author Title Source

Marius Hillenbrand, Mathias Gottschlag, Jens Kehne and Frank Bellosa

APSys '17, 8th ACM SIGOPS Asia-Pacific Workshop on Systems, Mumbai, India, September 2-3, 2017

Marius Hillenbrand, Frank Bellosa

Best Student Poster at EuroSys 2017, Belgrade, Serbia, April 23 - 26, 2017

Marius Hillenbrand

Technical Report, KITopen, September 5, 2017

Marius Hillenbrand

Frühjahrstreffen der Fachgruppe Betriebssysteme in der Gesellschaft für Informatik, Schloss Reisenburg, March 2 - 3, 2017