AVX Frequency Management

The performance of modern CPUs is limited by the capacity of the cooling system and power supplies, so further increasing the performance by increasing the CPU frequency is usually not possible. Therefore, modern CPUs increasingly use specialized accelerators instead to execute specific operations more efficiently. The additional power consumption of these accelerators, however, can lead to a reduction of the maximum possible processor frequency.

For example, current Intel CPUs reduce their frequency when AVX2 and AVX-512 instructions – instructions to parallelize identical operations - are used. This frequency reduction, however, also affects program code which does not use these instructions, if the code is executed on the same CPU core in close temporal proximity to AVX2 or AVX-512 instructions. This effect slows down programs mixing AVX and non-AVX code, and it skews the scheduler fairness towards applications using such instructions.

We developed a mechanism to detect and measure preventable influence of frequency reductions on code without AVX2 and AVX-512 instructions. Our approach periodically enforces a change to the maximum processor frequency and then observes whether the software currently running on the system immediately reduces the frequency again during the following microseconds. Based on the observed frequency changes, our profiler is able to detect situations in which the previous CPU frequency is not optimal for the currently running code.

To reduce the negative impact of such situations on performance, we developed several approaches. For example, we conducted simulations to demonstrate the potential for further optimization in the policy used by the CPU to select CPU frequencies. One reason for the slowdown of code without AVX2 and AVX-512 instructions is that - analogous to techniques in the area of dynamic power management - the processor delays restoring the original frequency to reduce the number of frequency changes. We were able to demonstrate that application knowledge of the future behavior of the system enables immediate frequency changes which reduces the amount of affected non-AVX2/AVX-512 code.

In addition, we showed that it is beneficial for the performance of a system if AVX2 and AVX-512 code is concentrated on a subset of the CPU cores. In this case, the slowdown caused by AVX-512 can, for example, be reduced by more than 70% in various applications, as only CPU cores which execute AVX2 or AVX-512 instructions are affected by the slowdown. We showed that trap-and-migrate is an effective technique to detect these instructions and to migrate the corresponding code sections to the selected CPU cores. This technique reconfigures CPU cores and temporarily removes support for the instructions, so that executing the instructions interrupts the application.

Contact: Mathias Gottschlag

Author Title Source

Mathias Gottschlag, Philipp Machauer, Yussuf Khalil, Frank Bellosa

2021 USENIX Annual Technical Conference. July 14–16, 2021

Mathias Gottschlag, Peter Brantsch, Frank Bellosa

SYSTOR 2020, 13th ACM International Systems and Storage Conference, Haifa, Israel, October 13-15, 2020

Mathias Gottschlag, Tim Schmidt, Frank Bellosa

APSys'20, 11th ACM SIGOPS Asia-Pacific Workshop on Systems, August 24–25, 2020, Tsukuba, Japan

Mathias Gottschlag, Yussuf Khalil, Frank Bellosa

Technical Report, arXiv, May 4, 2020

Mathias Gottschlag, Frank Bellosa

The 9th Workshop on Systems for Multi-core and Heterogeneous Architectures co-located with Eurosys 2019, Dresden, Germany, March 25-28, 2019

Mathias Gottschlag, Frank Bellosa

Technical Report, arXiv, December 20, 2018