ITEC-OS Staff

Stage-Aware Scheduling in a Library OS

Type:Bachelor Thesis
Date:27.03.2018
Supervisor:
Prof. Dr. Frank Bellosa
Mathias Gottschlag
Graduand:Christian Schwarz

Links:PDF
Abstract:

Scalable high-performance network servers are a requirement in today’s distributed infrastructure. Event-driven concurrency models often provide better scalability properties than multi-threaded servers but many legacy applications still follow the multi-threaded model where each request is handled by a dedicated operating system thread. Recent proﬁling at Google suggests that the instruction working set of many server applications does not ﬁt into the private i-caches of contemporary processors, causing underutilization of their super-scalar out-of-order pipeline. In a multi-threaded server with an oversized instruction working set, context switches between two request-handler threads are thus likely to cause i-cache misses and subsequent pipeline stalls. We start by analyzing existing approaches to optimize the cache behavior of network servers. One technique applicable to multi-core systems is executing diﬀerent parts of an application’s code on diﬀerent cores. By migrating threads to those cores whose caches contain the threads’ current instruction working set, the application’s code is eﬀectively spread over the system’s private i-caches and code misses are greatly reduced. Proof-of-concept work at the KIT OS group shows the potential of this technique, but the implementation does not scale to multiple clients and cores. In this thesis, we therefore propose that the spreading technique described above must be tightly integrated with the OS thread scheduler. We present an unintrusive user-space API that allows partitioning a multi-threaded server’s request handler code path into stages. Our scheduler then dynamically assigns cores to stages and dispatches threads on their current stages’ cores. We evaluate our design and its implementation in the OSv library operating system by adapting the MySQL database management system to our solution. We achieve up to 22% higher throughput caused by a 65% reduction of L2 i-cache misses without having to sacriﬁce request latency for this improvement.

BibTex:

@masterthesis{schwarz18scheduling,
author = {Christian Schwarz},
title = {Stage-Aware Scheduling in al Library OS},
type = {Bachelor Thesis},
year = 2018,
month = mar # "27",
school = {Operating Systems Group, Karlsruhe Institute of Technology (KIT), Germany}
}