Exploring Pre-scan, Parallel Copy, and Large Pages for Continuous Checkpointing

  • Type:Master Thesis
  • Date:30.11.2018
  • Supervisor:

    Prof. Dr. Frank Bellosa
    Marc Rittinghaus

  • Graduand:Janis Schötterl-Glausch
  • Links:PDF
  • SimuBoost is a concept to speed up full system simulation, which is hampered by its low execution speed. To achieve this, SimuBoost relies on lightweight, continuous virtual machine checkpointing. SimuBoost’s checkpointing implementation is incremental and makes use of copy-on-write and concurrent-copy to avoid the high downtime that would be the result of performing the copy while the virtual machine is stopped. Incremental checkpointing only copies those pages written to since the last checkpoint. It therefore requires a dirty logging mechanism. SimuBoost’s preferred dirty logging mechanism scans the page tables mapping the guest physical address space to the host physical address space. This occurs during the downtime, increasing it.
    We explore if pre-scan can decrease the downtime by moving part of the scan outside the downtime We find that, while pre-scan succeeds in reducing the downtime, it can negatively impact the performance of the virtual machine, mostly for interval lengths under 500 ms. At 500 ms, pre-scan causes a sub 1% increase in performance.
    The page faults caused by copy-on-write degrade the performance of the virtual machine. We investigate if the number of such page faults can be decreased. We attempt this by parallelizing concurrent-copy. The rationale is to speed up concurrent-copy, so it can save more pages before they can incur a copy-on-write page fault. While our implementation can roughly half the number of page faults, it does not improve performance, instead performance is reduced for intervals shorter than 500 ms. For 500 ms, parallel copy improves performance by less than 1%.
    Generally, the use of large pages improves the performance of virtual machines. SimuBoost does not make use of large pages during checkpointing. To do so would increase the amount of memory to capture, the additional overhead reduces the benefit of large pages. A naive, experimental implementation of checkpointing with large pages shows a benefit only for interval lengths of 2 s and higher. The benefit is approximately 5%. For smaller intervals performance is decreased.


      author = {Janis Schötterl-Glausch},
      title = {Exploring Pre-scan, Parallel Copy, and Large Pages for Continuous Checkpointing},
      type = {Master Thesis},
      year = 2018,
      month = nov # "11",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}