Home | deutsch  | Legals | KIT

On Statistical Properties of Duplicate Memory Pages

On Statistical Properties of Duplicate Memory Pages
Type:Diploma Thesis

Prof. Dr. Frank Bellosa, Marc Rittinghaus, Konrad Miller

Graduand:Thorsten Gröninger


In this work, we investigate the possibility to make memory deduplication Scanners more efficient. Modern memory scanners equipped with hinting mechanisms merge large amounts of duplicate memory pages originating from disk, but still lack to harvest other replicas equally fast. We analyzed the properties of this remaining sharing potential and aim to decrease the amount of scanned pages by directly focusing memory scanners to stable page content. Stability is necessary to share content, or otherwise the sharing is instantly broken. With a metric to exclude unstable pages, it is possible to speed up merging.

We acquired memory modifications and semantic information with a fullsystem simulation to analyze sharing opportunities, memory access frequencies, and access patterns which lead to stable pages. We implemented a toolchain that allows to gather such information quickly and scalably. Our evaluation shows that up to 89% of all pages are stable and can be shared with other VMs executing the identical file benchmark. Furthermore, a heuristic for CPU or I/O bound workloads can only exist for a small sub-set of examined workloads, e.g., kernel builds. General page state prediction seems impossible.

Our findings show that memory write frequencies correlate with page stability, even in otherwise unpredictable workloads. About 78% of all pages experience a low access frequency before they stabilize. A memory scanner should therefore prioritize pages that show a low write access frequency. A reasonable threshold appears to be about 4 accesses within a window of 1.5 seconds. Pages with high memory access frequencies such as device associated page frames can be excluded permanently from scans, if their overall busy time exceeds 15 seconds. We further conclude that a scanner should focus on pages leaving the write working set instead of linear scanning all pages. These pages (on average about 1,800 pages per 480 ms) are guaranteed to have been recently modified, but are not currently written and are thus candidates for further examination by a scanner.


 author = {Thorsten Gr\"oninger},
 title = {On Statistical Properties of Duplicate Memory Pages},
 type = {Diploma Thesis},
 school = {System Architecture Group, Karlsruhe Institute of Technology (KIT), Germany},
 month = oct # "31",
 year = 2013,
 note = {\url{http://os.ibds.kit.edu/}}