Core Specialization for AVX-512 Using Fault-and-Migrate

  • Type:Master Thesis
  • Date:08.07.2019
  • Supervisor:

    Prof. Dr. Frank Bellosa
    Mathias Gottschlag

  • Graduand:Peter Brantsch
  • Links:PDF
  • The Advanced Vector Extensions 512 (AVX-512) are modern Single Instruction Multiple Data (SIMD) extensions to the x86 instruction set using 512-bit wide registers, enabling substantial acceleration of numeric workloads, for example processing eight sets of 64-bit operands in parallel. Because of the high power consumption of the corresponding functional units, a CPU core executing AVX-512 instructions has to temporarily reduce its
    clock frequency to maintain thermal and electrical limits. This clock frequency reduction can slow down the scalar part of mixed workloads because it persists substantially beyond the last AVX-512 instruction.
    To mitigate this performance impediment, core specialization can be used, which is the preferred use of certain cores for specific kinds of computation. By running AVX-512 and scalar code on disjoint sets of CPU cores, throttling of cores executing scalar code can be avoided. The Operating Systems Group at the Karlsruhe Institute of Technology has already demonstrated that core specialization can be effectively employed against the aforementioned performance reduction by implementing it in Linux. A new system call is introduced
    to mark the beginning and end of AVX-512 phases of a task, such that the scheduler can migrate
    it to a specialized core. However, the existing implementation is neither transparent nor automatic, but instead requires the application to be modified.
    This thesis presents an extension of the existing core specialization implementation, making it transparent and automatic by efficiently virtualizing AVX-512 to intercept the instructions and subsequently trigger migration. Our extension determines the necessary number of AVX-512 cores at runtime based on CPU time consumed. Because there is no trivial way of detecting the end of an AVX-512 phase, we compare different heuristics for
    We evaluate our prototype in a web server scenario with nginx and OpenSSL using ChaCha20-Poly1305 encryption and brotli compression, using AVX-512 to accelerate the combination of cipher and message authentication code. The benchmarks show that the performance degradation caused by AVX-512-induced frequency reductions can be almost completely mitigated, without having to modify the application.


      author = {Peter Brantsch},
      title = {Core Specialization for AVX-512 Using Fault-and-Migrate},
      type = {Master Thesis},
      year = 2019,
      month = jul # "08",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}