Efficient Main Memory Deduplication Through Cross Layer Integration
Dr. Konrad Miller
Dissertation, Fakultät für Informatik (INFORMATIK), Institut für Technische Informatik (ITEC)
- Date: 2014
An operating system with more random access memory available can use this additional storage capacity to improve the overall system performance. This improvement mainly stems from additional caching and the increased degree of multiprogramming. A prime example is cloud computing. In cloud computing, virtual machines (VMs) permit the flexible allocation and migration of services as well as the consolidation of systems onto fewer physical machines, while preserving strong service isolation. The main memory size limits how many VMs can be co-located on a physical host and also greatly influences their I/O-speed.
Memory sharing was already a hot topic in the 60s, when time sharing systems were first used. In consequence, the copy-on-write mechanism and simple sharing policies were invented to reduce memory duplication from equal source objects (e.g., shared libraries). Today, memory sharing is an important research topic again as previous studies have shown that the memory footprint of VMs often contains a significant amount of pages with equal content while at the same time traditional Memory sharing approaches are not applicable in such workloads. The main problem in deduplicating virtual machine memory is a semantic gap caused by the isolation of VMs that makes the identification of duplicate memory pages difficult.
Memory deduplication scanners disregard the sources of and reasons for duplicated memory pages and base the detection of such sharing opportunities purely on their content. Memory scanners continuously create and update an index of memory contents at a certain rate. If memory is to be inserted into the index that already contains the respective page-data, both memory pages are merged and shared using
copy-on-write. One of the two memory pages can then be released and reused fully transparently to the affected processes (VMs).
Memory scanners directly trade computational overhead and memory bandwidth with deduplication success and latency. Especially the merge latency, the time between establishing certain content in a page and merging it with a duplicate, is high in such systems. Current scanners, for example VMware’s ESX, need a long time (in the range of 5–30 min) to detect new sharing opportunities and therefore can only find static sharing opportunities instead of exploiting the full sharing potential.