Optimizing Continuous Checkpoints for Deterministic Replay

  • Type:Master Thesis
  • Date:15.07.2018
  • Supervisor:

    Prof. Dr. Frank Bellosa
    Marc Rittinghaus

  • Graduand:Jan Ruh
  • Links:PDF
  • Abstract:
    Functional full system simulation allows monitoring the internal state of a system, including its guest operating system, for detailed analysis. Known functional full system simulators, such as QEMU, have in common that their execution speed suffers considerably, compared to the execution on real hardware. As a result, functional full system simulation cannot be used to analyze interactive and long running workloads.

    Rittinghaus et al. propose SimuBoost to speed up full system simulation. SimuBoost leverages the almost bare-metal execution speed of a hardware-assisted virtual machine (VM), VM checkpointing, and deterministic record and replay to allow for concurrent, distributed simulation of execution intervals.

    The checkpoint mechanism, which creates continuous, incremental checkpoints, is an important component of SimuBoost. We provide an extended analysis of SimuBoost and derive formal requirements for the checkpoint mechanism. We evaluate the existing checkpoint implementation regarding the performance of checkpoint creation and checkpoint loading. We further discuss experiments regarding the working set size of simulation intervals. We find that for a fixed length execution interval of SimuBoost it is sufficient to only load the working set of said interval. As a result, we propose sparse checkpointing in order to optimize continuous checkpoints for deterministic replay. Sparse checkpointing leverages access information acquired during checkpoint creation to determine the working set of each checkpoint interval. We use the access information during checkpoint loading to only restore page frames that are in the working set of the respective simulation interval.

    Our evaluation shows that sparse checkpointing significantly reduces the average memory footprint of simulations by up to 79% for a Linux kernel build and decreases the checkpoint loading time by up to 89 %. As a result, sparse checkpointing allows for a higher number of concurrent simulations on a single workstation.


      author = {Jan Ruh},
      title = {Optimizing Continuous Checkpoints for Deterministic Replay},
      type = {Master Thesis},
      year = 2018,
      month = jul # "15",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}