High-level parallel programming frameworks for heterogeneous clusters
|High-level parallel programming frameworks for heterogeneous clusters
|NAISS Small Compute
|Christoph Kessler <email@example.com>
|2024-01-03 – 2025-02-01
In previous work (including EU FP7 projects PEPPHER, EXCESS, SeRC project OpCoReS and EU H2020 project EXA2PRO) we have developed several frameworks for portable, high-level parallel programming of multi-core CPU and GPU-based heterogeneous parallel systems, such as the PEPPHER composition framework [Dastgeer et al. 2014] for multi-variant parallel software components and the SkePU skeleton programming library for GPU-based systems [Enmyren and Kessler 2010, Dastgeer 2014, Ernstsson 2018, Ernstsson 2022], with back-end support mainly for OpenMP, OpenCL and CUDA.
Optimization techniques in these frameworks include automatically tuned, adaptive (context-dependent) selection of implementation variants of computations [Dastgeer et al. 2011, 2013], hybrid parallel execution involving different types of cores and accelerators together [Dastgeer et al. 2012], and data abstractions and automated memory management techniques for aggregate data structures ("smart data-containers") for the run-time minimization of PCIe bus communication between main memory and accelerator device memory [Dastgeer and Kessler 2016]. Since 2020, the third generation of SkePU is available as open-source software with a permissive license, see https://skepu.github.io, and is used both in cooperative research projects such as EU H2020 EXA2PRO and ELLIIT GPAI, and in teaching high-level parallel programming at Linköping University, as well as in student thesis projects at LiU.
Due to SkePU's strict design for portability, SkePU 3 programs can run even in parallel across multiple nodes of a MPI-based cluster, without any modification in the program source code (see the SNIC 2016/5-6 project activity report of December 2020). However, up to now, not much work has been done towards memory and communication optimizations when executing skeleton programs across multiple, possibly heterogeneous, nodes in a HPC cluster. Likewise, hybrid multi-node execution with automatic load balancing is a challenge in the case of heterogeneous clusters that involve compute nodes of different kind and capability, e.g., with GPUs or no accelerators.
In this project, we will evaluate the most recent versions of our prototype framework with benchmark applications on Tetralith hardware and study extended auto-tuned back-end selection, automated load balancing for hybrid computing using different forms of runtime system support, and generalizations of the smart data-container concept at the cluster level for SkePU skeleton programs. It will also be used for final thesis student projects about porting HPC benchmarks and example applications to SkePU. For experiments we will use smaller homogeneous as well as heterogeneous partitions of Tetralith that include both CPU-only nodes and nodes equipped with GPUs.
References: See the publication list on the SkePU web page, the link is given below.