High-level parallel programming frameworks for heterogeneous clusters
Title: |
High-level parallel programming frameworks for heterogeneous clusters |
DNr: |
NAISS 2025/22-60 |
Project Type: |
NAISS Small Compute |
Principal Investigator: |
Christoph Kessler <christoph.kessler@liu.se> |
Affiliation: |
Linköpings universitet |
Duration: |
2025-02-01 – 2026-02-01 |
Classification: |
10205 |
Homepage: |
https://skepu.github.io |
Keywords: |
|
Abstract
In previous work (including EU FP7 projects PEPPHER, EXCESS, SeRC project OpCoReS and EU H2020 project EXA2PRO) we have developed several frameworks for portable, high-level parallel programming of multi-core CPU and GPU-based heterogeneous parallel systems, such as the SkePU skeleton programming framework for GPU-based systems [Enmyren 2010, Dastgeer 2014, Ernstsson 2018, Ernstsson 2022], with multi-back-end support mainly for OpenMP, OpenCL and CUDA. Optimization techniques in these frameworks include adaptive backend selection for computations, hybrid parallel execution involving different types of cores and accelerators together, and data abstractions and automated memory management techniques for aggregate data structures ("smart data-containers") for the run-time optimization of communication between main memory and accelerator device memory [Dastgeer and Kessler 2016]. Since 2020, the third generation of SkePU is available as open-source software with a permissive license, see https://skepu.github.io, and is used both in cooperative research projects such as EU H2020 EXA2PRO, ELLIIT GPAI and SSF ASTECC, and in teaching high-level parallel programming at Linköping University, as well as in student thesis projects at LiU.
Due to SkePU's strict design for portability, SkePU 3 programs can run even in parallel across multiple nodes of a MPI-based cluster, without any modification in the program source code (see the SNIC 2016/5-6 project activity report of December 2020). However, up to now, not much work has been done towards memory and communication optimizations when executing skeleton programs across multiple, possibly heterogeneous, nodes in a HPC cluster. Likewise, hybrid multi-node execution with automatic load balancing is a challenge in clusters that involve compute nodes of different kind and capability, e.g., with GPUs or no accelerators. Other challenges in high-level programming framework design and implementation include support for mixed-domain (AI+X) computations and how to suitably configure and map computations efficiently to the manifold heterogeneous parallel execution resources in modern HPC hardware, such as CPU, CUDA and tensor cores.
In this project, we evaluate experimental new versions of our prototype framework with benchmark applications on Tetralith hardware and study framework extensions and optimizations at node and cluster level for SkePU skeleton programs. It is also used for final thesis student projects about porting HPC benchmarks and example applications to SkePU. For experiments we use smaller homogeneous as well as heterogeneous partitions of Tetralith that include both CPU-only nodes and nodes equipped with GPUs.
References: See the publication list on the SkePU web page, https://skepu.github.io.