Checkpoint Distribution for SimuBoost

  • Typ:Masterarbeit
  • Datum:26.10.2017
  • Betreuung:

    Prof. Dr. Frank Bellosa
    Marc Rittinghaus

  • Bearbeitung:Andreas Pusch
  • Links:PDF
  • Abstract:

    Full system simulation provides means for analyzing systems by allowing reproduction of physical hardware events as well as providing access to analytical data and analysis tools. Benefits of these simulations are the support for malware analysis, memory studies, and high availability testing as well as operating system development and debugging. One major downside though is the slowdown of the simulation of a factor of 31 up to 810 in comparison to hardware-assisted virtualization. This slowdown limits the applicability of full system simulation to short running workloads. SimuBoost aims to solve this issue by running the workload in a hardware-assisted virtual machine and splitting it into multiple simulation intervals. At each interval, SimuBoost creates an incremental checkpoint which is
    used to bootstrap the simulation of the interval. These checkpoints are distributed by SimuBoost across a simulation cluster which runs the simulation intervals in parallel.

    The current distribution mechanism leads to high network load which can result in a network bottleneck and limits the scalability of SimuBoost. This distribution leads to longer checkpoint loading times and a slowdown of the interval simulations as well as the overall achievable speedup. We are evaluating distributed storage and multicast as possible solutions to achieve our goals of reduced checkpoint loading times, increased scalability and a reduced network load to make SimuBoost a viable option for Gigabit Ethernet networks using commodity hardware and therefore increase its applicability.

    We have implemented a multicast solution that has satisfied all of our goals. Our solution does not exhaust a Gigabit Ethernet network using the build-linux-kernel and SPECjbb benchmarks and results in stable checkpoint loading times. Our solution scales well and achieves a speedup when doubling the number of parallel simulations from 12 to 24.

    BibTex:

    @mastersthesis{pusch17checkpointdistribution,
      author = {Andreas Pusch},
      title = {Checkpoint Distribution for SimuBoost},
      type = {Master Thesis},
      year = 2017,
      month = oct # "26",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}
    }