Constructing a Library for Mitigating AVX-Induced Performance Degradation

  • Type:Master Thesis
  • Date:24.03.2019
  • Supervisor:

    Prof. Dr. Frank Bellosa
    Mathias Gottschlag

  • Graduand:Ioannis Papamanoglou
  • Links:PDF
  • AVX-512 is a recent x86 instruction set extension that aims to accelerate vectorizable workloads by increasing the vector size further. The on-CPU SIMD units that make efficient operation on very wide vectors possible, take up a lot of space on the chip and have high power requirements, as a result they are part of the dark silicon of a CPU. The dark silicon of a chip is circuitry that has to be turned off during normal operation, due to high power requirements. To maintain a reasonable TDP, current CPUs reduce the core frequency when those units are active. A lot of workloads benefit from vectorization despite the frequency reduction. However, the high power consumption of wide vector instructions cause additional side effects. Most workloads do not consist of only vectorizable code. To achieve good performance for non-vectorized (scalar) code that is run on the same core, the frequency needs to be increased as soon as possible. Increasing the frequency is inflicted with delays, causing non-vectorized code to run at unnecessarily low frequencies during the changing period. Slowing down the non-vectorized code on a core can result in worse overall performance, making the feasibility of AVX-512
    and alike very unpredictable.
    In this thesis we create a framework that supports application developers in mitigating the overall system performance degradation induced by AVX-512. We build upon an existing approach that uses core specialization to isolate instructions leading to performance degradation onto a small set of cores, letting scalar parts of
    the system workload run unrestrained. Application developers can use our library to mark code that potentially executes AVX-512 instructions. We designed a dynamic policy that decides during runtime whether to offload marked code regions onto a set of dedicated cores to prevent them from slowing down subsequent scalar
    code. While our framework is focused on AVX-512, our design and theories apply to general high power instructions that require a frequency reduction.
    We show that our framework can reduce the performance degradation in a realistic web server benchmark from 17% to 9%. When a scalar version of the benchmark is run, our framework does not introduce significant performance penalties, making the framework a useful tool to reduce the risk of unexpected performance
    degradation caused by AVX-512.


      author = {Ioannis Papamanoglou},
      title = {Constructing a Library for Mitigating AVX-Induced Performance Degradation},
      type = {Master Thesis},
      year = 2019,
      month = mar # "24",
      school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}