SimuBoost
Full system simulation allows simulating an entire physical machine on top of a host operating system (OS) and thus provides a powerful foundation to study the runtime behavior and interaction of computer architecture, operating systems and applications. Since the entire execution environment in such a system is virtual, every operation carried out can be inspected easily.
A well-known limitation of full system simulation is the low execution speed offered by current simulators. Compared to hardware-assisted virtualization, functional simulation is orders of magnitude slower. In practice, this slowdown creates severe obstacles for comprehensive use of functional full system simulation:
- Interactivity Scenarios that should capture interactivity with a human user or an external network device are not feasible. A single keystroke can quickly take from multiple seconds up to minutes until being fully processed, making human-user interactions cumbersome and unnatural. Network protocols such as TCP, in turn, react to the slowdown with throttling and timeouts.
- Accuracy of Results Since the simulation considerably slows down the simulated applications and operating system, activities dependent on events external to the virtual machine such as I/O operations appear to complete faster – a phenomenon called time dilation. This distorts measurements and produces unrealistic execution behavior.
- Coverage Evaluating a test scenario in full length can take considerable time, forcing researchers to reduce coverage.
Representative sampling can reduce the run-time overhead by limiting complex analyses to short time frames that are representative for the analyzed workload. However, an initial functional simulation to identify such intervals is still needed and the accuracy achievable with this technique also heavily depends on sufficient phase behavior in the workload, which is not always present. Moreover, in some scenarios (e.g., analysis of memory duplication) limiting the observation window is not an option. An acceleration technique to enable full-length analyses of long-running workloads is thus desirable.
SimuBoost strives to close the performance gap between virtualization and functional simulation through the use of scalable parallelization. The core idea is to run the workload in a virtual machine (VM), taking checkpoints in regular intervals. Due to the difference in execution speed between virtualization and simulation, the spans between subsequent checkpoints can then be simulated and analyzed simultaneously in one job per interval. By transferring jobs to multiple nodes, a parallelized and distributed simulation of the target workload can be achieved, thereby reducing the overall simulation time.
Key challenges in SimuBoost are:
- Checkpointing SimuBoost has to create checkpoints in short intervals (<1s - 2s) to bootstrap parallel simulations. To achieve a high speedup, the downtime, that is the time the VM has to be paused for each checkpoint, must be as short as possible. We use incremental copy-on-write checkpointing with asynchronous scanning of page tables to minimize downtime. To reduce the amount of data that needs to be stored and transferred to remote nodes, SimuBoost employs multicast data distribution with various data reduction and compression techniques that allow SimuBoost to operate in regular Gigabit Ethernet networks.
- Functional Continuity Full system simulators usually implement a deterministic execution model. Using hardware-assisted virtualization, however, introduces non-deterministic behavior as devices work asynchronously to the CPU. In consequence, non-deterministic events (e.g., interrupts) appear at different points in the virtualization and simulation stages. This leads to state deviation, where the continuity at interval boundaries in the simulation breaks. SimuBoost logs non-deterministic events in the virtualization and precisely replays them in the simulation, keeping both stages synchronized. This also allows interactive workloads to be faithfully replayed in the simulation.
We have a working prototype of SimuBoost. Our evaluation confirms previous research that has shown significant speedup potential for the partitioning and parallelization of simulation time. SimuBoost demonstrates for the first time that the concept can also be very effectively used to accelerate continuous functional full system simulation. For most workloads, we measure a remaining slowdown of simulation over hardware-assisted virtualization of less than 30%, irrespective of the degree of instrumentation (tracing memory reads and writes). Only benchmarks with considerable run-time overhead during the checkpointing and recording phase show higher remaining slowdowns (apache: 120%, postmark: 63% – 81%).
Contact: Dr.-Ing. Marc Rittinghaus
Author | Title | Source |
---|---|---|
Dr.-Ing. Marc Rittinghaus |
SimuBoost: Scalable Parallelization of Functional System Simulation | Dissertation, Fakultät für Informatik, Institut für Technische Informatik (ITEC), Karlsruher Institut für Technologie (KIT) |
Marc Rittinghaus |
SimuBoost: Scalable Parallelization of Functional System Simulation Poster | Poster session of 8th Eurosys Doctoral Workshop (EuroDW 2014), Amsterdam, Netherlands, April 13, 2014 |
Marc Rittinghaus, Konrad Miller, Marius Hillenbrand, and Frank Bellosa |
SimuBoost: Scalable Parallelization of Functional System Simulation | 11th International Workshop on Dynamic Analysis (WODA 2013), Houston, Texas, March 16, 2013 |
Speaker | Title | Conference |
---|---|---|
Marc Rittinghaus |
SimuBoost Talk GI-BS Fachgruppentreffen 2016 | GI-BS Fachgruppentreffen, Fujitsu Augsburg, October 2016 |
Marc Rittinghaus |
SimuBoost Talk GI-BS Fachgruppentreffen 2013 | GI-BS Fachgruppentreffen, TU Braunschweig, April 2013 |
Author | Title | Type | Date | Advisor |
---|---|---|---|---|
Benedikt Morbach | Accurate Record & Replay of x86 MMU Behavior for SimuBoost | Master Thesis | 13.09.2018 | Prof. Dr. Frank Bellosa |
Andreas Pusch | Checkpoint Distribution for SimuBoost | Master Thesis | 26.10.2017 | Prof. Dr. Frank Bellosa |
Janis Schötterl-Glausch | Exploring Pre-scan, Parallel Copy, and Large Pages for Continuous Checkpointing | Master Thesis | 30.11.2018 | Prof. Dr. Frank Bellosa |
Jan Ruh | Optimizing Continuous Checkpoints for Deterministic Replay | Master Thesis | 15.07.2018 | Prof. Dr. Frank Bellosa |
Michael Zangl | Towards Heterogeneous Deterministic Replay for Symmetric Multiprocessors | Master Thesis | 12.11.2017 | Prof. Dr. Frank Bellosa |
Simon Veith | Towards Heterogeneous Record and Replay on the ARM Architecture | Master Thesis | 31.01.2017 | Prof. Dr. Frank Bellosa |
Bastian Eicher | Virtual Machine Checkpoint Storage and Distrbution for SimuBoost | Master Thesis | 04.09.2015 | Prof. Dr. Frank Bellosa, Marc Rittinghaus |
Jan Ruh | Analyzing Duplication in Incremental High Frequency Checkpoints | Bachelor Thesis | 06.09.2015 | Prof. Dr. Frank Bellosa, Marc Rittinghaus |
Johannes Werner | Assessment of Virtual Machine Working-Sets in SimuBoost | Bachelor Thesis | 12.03.2018 | Prof. Dr. Frank Bellosa |
Nikolai Baudis | Deduplicating Virtual Machine Checkpoints for Distributed System Simulation | Bachelor Thesis | 02.11.2013 | Prof. Dr. Frank Bellosa, Marc Rittinghaus |
Nico Böhr | Evaluating Copy-On-Write for High Frequency Checkpoints | Bachelor Thesis | 30.09.2015 | Prof. Dr. Frank Bellosa, Marc Rittinghaus |
Janis Schoetterl-Glausch | Intel Page Modification Logging for Lightweight Continuous Checkpointing | Bachelor Thesis | 31.10.2016 | Prof. Dr. Frank Bellosa |
Marco Schlumpp | Towards Three-Stage Parallelization of System Simulation | Bachelor Thesis | 16.10.2019 | Prof. Dr. Frank Bellosa |