Sigma V100 GPU access for "High-level parallel programming frameworks for heterogeneous clusters"
Title: Sigma V100 GPU access for "High-level parallel programming frameworks for heterogeneous clusters"
DNr: LiU-gpu-2023-5
Project Type: LiU Compute
Principal Investigator: Christoph Kessler <christoph.kessler@liu.se>
Affiliation: Linköpings universitet
Duration: 2024-01-01 – 2026-01-01
Classification: 10201
Homepage: https://skepu.github.io
Keywords:

Abstract

This small project is a complementation of my small project "High-level parallel programming frameworks for heterogeneous clusters" SNIC 2020/13-113 (which uses Tetralith GPU-nodes) in order to get access to the Sigma GPU-nodes with their different GPU type (V100), which will thus allow us to evaluate our prototypes (see below) on an additional GPU type, thereby broadening our experimental basis. The project description is otherwise identical to SNIC 2020/13-113 and its later extensions and included here for completeness, only with "Tetralith" replaced by "Sigma GPU nodes": In previous work (including EU FP7 PEPPHER, EXCESS, SeRC OpCoReS and EU H2020 EXA2PRO) we have developed several frameworks for portable, high-level parallel programming of multi-core CPU and GPU-based heterogeneous parallel systems, such as the PEPPHER composition framework [Dastgeer et al. 2014] for multi-variant parallel software components and the SkePU skeleton programming library for GPU-based systems [Enmyren and Kessler 2010, Dastgeer 2014, Ernstsson 2018, Ernstsson 2022], with back-end support mainly for OpenMP, OpenCL and CUDA, more recently also for clusters. Optimization techniques in these frameworks include automatically tuned, adaptive (context-dependent) selection of implementation variants of computations [Dastgeer et al. 2011, 2013], hybrid parallel execution involving different types of cores and accelerators together [Dastgeer et al. 2012], and data abstractions and automated memory management techniques for aggregate data structures ("smart data-containers") for the run-time optimization of data transfers between main memory and accelerator memory [Dastgeer and Kessler 2016]. Since 2020, the third generation of SkePU is available as open-source software with a permissive license, see https://skepu.github.io, and is used both in cooperative research projects such as EU H2020 EXA2PRO and ELLIIT GPAI and in teaching high-level parallel programming at Linköping University, as well as in student thesis projects at LiU. Due to SkePU's strict design for portability, SkePU 3 programs can run even in parallel across multiple nodes of a MPI-based cluster, without any modification in the program source code (see the SNIC 2016/5-6 project activity report of December 2020). However, up to now, not much work has been done towards memory and communication optimizations when executing skeleton programs across multiple, possibly heterogeneous, nodes in a HPC cluster. Likewise, hybrid multi-node execution with automatic load balancing is a challenge in the case of heterogeneous clusters that involve compute nodes of different kind and capability, e.g., with GPUs or no accelerators. In this project, we will evaluate the most recent versions of our prototype framework with benchmark applications on Sigma GPU node hardware and study extended auto-tuned back-end selection, automated load balancing for hybrid computing using different forms of runtime system support, and generalizations of the smart data-container concept at the cluster level for SkePU skeleton programs. It will also be used for final thesis student projects about porting HPC benchmarks and example applications to SkePU. For experiments we will use smaller homogeneous as well as heterogeneous partitions of Sigma that include both CPU-only nodes and nodes equipped with GPUs. References: See the publication list on the SkePU web page, the link is given below.