SimuBoost: Scalable Parallelization of Functional System Simulation
- 
                    Author:
                    Dr.-Ing. Marc Rittinghaus 
- 
                    Source:
                    Dissertation, Fakultät für Informatik, Institut für Technische Informatik (ITEC), Karlsruher Institut für Technologie (KIT) 
- Date: 19.07.2019
- 
                    Gathering detailed run-time information such as memory access traces in operating system and security research often involves functional full system simulation (FFSS). The simulator runs the workload of interest in a virtual machine (VM), gradually interpreting or translating instructions so that they operate on the state of the VM and allow for comprehensive instrumentation. 
 
 While functional full system simulation is a powerful tool, a severe limitation is its immense slowdown. For QEMU, we have measured average slowdowns of 30x and 60x for plain simulation and tracing of memory accesses, respectively. Simulators offering more advanced instrumentation capabilities can even be an order of magnitude slower. This quickly renders functional simulation impractical for long-running, networked, or interactive workloads. Furthermore, the slowdown creates unrealistic timing behavior whenever activities external to the virtual machine (e.g., I/O) are involved.
 
 In this thesis, we present SimuBoost, a method for drastically accelerating functional full system simulation. SimuBoost runs the workload in a fast and interactive hardware-assisted virtual machine while periodically taking checkpoints. These checkpoints then serve as starting points for simulations, enabling to simulate and analyze each interval simultaneously in one job per interval. Heterogeneous deterministic replay guarantees that the simulations repeat the exact same execution as in the hardware-assisted run, including interactions and recorded timing.
 
 Our prototype is able to significantly reduce the run time of functional full system simulation while providing full interactivity. Simulating an entire kernel build completes in just 16% more time than needed to run the same workload in a regular hardware-assisted virtual machine. SimuBoost is able to maintain this performance even with full instrumentation for memory tracing.
 
 This thesis represents the first project to apply the concept of partitioning and parallelization of execution time to interactive full system virtualization in a manner that allows for immediate parallel functional simulation. We complement the practical implementation with a performance model to formally describe the properties of the acceleration method and predict speedups. In contrast to previous work, SimuBoost places a strong focus on scalability beyond the limits of a single physical machine. It therefore makes heavy use of virtual machine checkpointing technology. In this course, we present two novel methods for efficiently and effectively reducing the size of periodic checkpoints.
