Towards Fully Automatic Staged Computation
-
Author:
Mathias Gottschlag, Christian Schwarz, Marc Rittinghaus and Frank Bellosa
-
Source:
The 8th Workshop on Systems for Multi-core and Heterogeneous Architectures, 23 April 2018, Porto, Portugal
-
Abstract:
Server applications often experience many stall cycles because their working set for individual requests exceeds the size of fast private CPU caches. Existing solutions for this problem usually involve refactoring the application to split it into multiple parts with smaller working sets. Scheduling these parts on multiple cores reduces the cache miss rate and increases performance. However, such refactoring of existing applications is often too labor-intensive.
In this paper, we describe an automatic solution to partition existing server applications and to execute the parts on individual cores to improve cache locality. Our system records the memory accesses of the application running representative input data and uses the resulting memory access trace to repeatedly try out different partitioning schemes in a cache simulator. The best-performing solution is then used to generate code to automatically migrate the application between cores. Our solution is already able to improve the performance of the MySQL database by 8.6% and is able to reduce L2 cache misses by more than 50%, even though only minimal developer interaction is required.Bibtex:
@inproceedings{gottschlag18tfasc,
author = {Gottschlag, Mathias and Schwarz, Christian and Rittinghaus, Marc and Bellosa, Frank},
title = {Towards Fully Automatic Staged Computation},
booktitle = {The 8th Workshop on Systems for Multi-core and Heterogeneous Architectures},
address = {Porto, Portogal},
year = 2018,
month = apr # "~23",
}