Towards Virtual InfiniBand Clusters with Network and Performance Isolation

  • Type:Diploma Thesis
  • Date:16.06.2011
  • Supervisor:

    Prof. Dr. Frank Bellosa, Dr. Jan Stoess, Viktor Mauch

  • Graduand:Marius Hillenbrand
  • Links:PDF
  • Abstract:

    Today's high-performance computing clusters (HPC) are typically operated and used by a single organization. Demand is fluctuating, resulting in periods of underutilization or overload. In addition, the static OS installation on cluster nodes leaves hardly any room for customization. The concepts of cloud computing transferred to HPC clusters - that is, an Infrastructure-as-a-Service (IaaS) model for HPC computing - promises increased flexibility and cost savings. Elastic virtual clusters provide precisely that capacity that suits actual demand and workload.

    Elasticity and flexibility come at a price, however: Virtualization overhead, jitter, and additional OS background activity can severely reduce parallel application performance. In addition, HPC workloads typically require distinct cluster interconnects, such as InfiniBand, because of the features they provide, mainly low latency. General-purpose clouds with virtualized Ethernet fail to fulfill these requirements.

    In this work, we present a novel architecture for HPC clouds. Our architecture comprises the facets node virtualization, network virtualization, and cloud management. We raise the question, whether a commodity hypervisor (the kernel-based virtual machine, KVM, on Linux) can be transformed to provide virtual cluster nodes—that is, virtual machines (VMs) intended for HPC workloads. We provide a concept for cluster network virtualization, using the example of InfiniBand, that provides each user with the impression of using a dedicated network. A user can apply a custom routing scheme and employ recursive isolation in his share of the network. However, he remains constraint to his virtual cluster and cannot impair other users - we verify this claim with experiments with an actual InfiniBand network. We discuss the new challenges that cluster networks bring up for cloud management, and describe how we introduce network topology to cloud management. A prototype for automatic network isolation provides a proof of concept.

    BibTex:

    @diplomathesis{hillenbrand11virtibclusters,
    author = {Marius Hillenbrand},
    title = {Towards Virtual InfiniBand Clusters with Network and Performance Isolation},
    type = {Diploma Thesis},
    address = {System Architecture Group, Karlsruhe Institute of Technology (KIT), Germany},
    month = june,
    year = 2011,
    url = {http://os.ibds.kit.edu/} }