FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries
||FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries|
||NAISS Small Compute|
||Yifei He <email@example.com>|
||Kungliga Tekniska högskolan|
||2023-11-01 – 2024-11-01|
Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR) and LLVM, for expressing Fourier Transform algorithms. We design a specific abstraction for FFT, working on implementing the CPU/GPU high-performance code generation pipelines using MLIR/LLVM.
Our aspiration with FFTc is to increase the productivity of algorithm developers without any loss in performance while at the same time being able to target multiple different backends (CPUs/GPUs/etc.) with the same input source code. In short, FFTc aims to increase productivity, portability, and (hopefully) performance.
The FFTc compilation pipeline has five core parts: (a) is the translation from the DSL to the Abstract Syntax Tree (AST), (b) is generating the MLIR out of AST, (c) stands for progressive lowering from FFT dialect to LLVM dialect, going through different levels of abstraction represented by dialects, (d) emits LLVM IR out of the MLIR’s LLVM dialect, (e) is the LLVM middle-end compilation and code generation.
The goal of FFTc is to create an input language that resembles (as close as possible) that of mathematics, which we believe will help end-users in being more productive without losing familiarity with the code they are writing.
The compilation starts at the frontend, where the lexical analysis, parsing, and building of an Abstract Syntax Tree (AST) based on our custom DSL language take place. The FFT dialect is the first state of MLIR generated from the AST. Then a series of lowering passes are applied on the FFT dialect in order to expand many of the custom operators (e.g., the Kronecker product) into a lowered state. For example, a matrix multiplication, written in our language using "·", will be expanded to a three-level nested loop implementing said matrix multiplication. Furthermore, we can apply several existing MLIR optimization passes (such as Affine) in order to further optimize the transformed kernels. Finally, near the end of the pipeline, we lower our representation to the LLVM Intermediate Representation (IR), after which we inject the code into the LLVM backend for compilation towards machine code.