Generative deep learning for data-centric medical imaging
Machine learning, especially by means of deep learning, has made substantial progress over the last decade. However, the data-hungry nature of deep learning means that the full potential of a model is often inhibited by lack of data. This problem is especially pronounced within medical imaging, where data is expensive to capture, relies on medical expertise for annotation, and is of sensitive and protected nature.
Synthetically generated images can be used to improve image-based deep learning applications, both by increasing the amount of training data and by ensuring that different types of image content is included. Traditionally, computer graphics has been used for this purpose, but requires modeling of the image content. While this in many cases can be accomplished for natural images, it is difficult to model the complex biological content depicted in medical images. An alternative solution is to use deep learning for automatic generation of new image content, by means of generative adversarial networks (GANs) or generative diffusion models (GDMs). Over the last few years, research on generative deep learning has progressed to the point that photo-realistic images can be generated in scenarios with narrow data distributions (cars, faces, etc.). For medical imaging, this is promising since the data domains in general are narrow. At the same time, GANs and GDMs have mostly been used to generate 2D images of limited resolution. In medical imaging, data modalities can be more challenging, such as 3D volumes in radiology, or giga-pixel whole slide images (WSIs) in digital pathology. Furthermore, it is problematic to preciseliy control the content generated by generative models.
This project aims at combining computer graphics and generative deep learning, in order to produce high-quality synthetic image datasets with detailed control over the image content. The overarching goal is a data-centric perspective of medical deep learning, where generated content can improve performance and robustness in limited data scenarios and aid in analyzing model performance under different types of variations.