Social Complexity and Fairness in Synthetic Medical Data
Title: Social Complexity and Fairness in Synthetic Medical Data
DNr: Berzelius-2024-19
Project Type: LiU Berzelius
Principal Investigator: Saghi Hajisharif <>
Affiliation: Linköpings universitet
Duration: 2024-01-17 – 2024-08-01
Classification: 20603


Machine learning, particularly through deep learning, has significantly advanced in the past decade. However, its data-intensive nature often limits its potential, especially in fields like medical imaging where data is scarce, expensive, and sensitive. To combat this, synthetic data generation using advanced techniques such as Generative Adversarial Networks (GANs), diffusion models, and vision transformers has emerged as a solution. These methods can augment the volume and diversity of training data in medical imaging, a field where data specificity and quality are critical. Recent developments in these technologies enable the creation of realistic images, fitting the narrow data domains characteristic of medical imaging. Diffusion models are adept at generating diverse, high-quality images, a feature invaluable for creating varied medical datasets. Vision transformers, with their attention mechanisms, offer a nuanced approach to understanding complex medical images. A crucial aspect of synthetic data generation is ensuring fairness. Fair data representation is essential to prevent biases in machine learning models, which can lead to inaccurate diagnoses and healthcare disparities. This project focuses on integrating fairness-aware algorithms and representation learning techniques into generative models like GANs, diffusion models, and vision transformers. The aim is to produce balanced synthetic datasets that reflect diverse demographic groups and pathological conditions, thereby minimizing biases in training data. The project’s primary goal is to establish a fair approach to medical deep learning. Synthetically generated content should not only improve model performance and robustness in scenarios with limited data but also ensure fair and accurate model performance analysis across varied conditions. This initiative is especially pertinent for generating high-resolution images in complex modalities such as 3D radiology volumes or gigapixel whole slide images (WSIs) in digital pathology. Ultimately, by combining the capabilities of advanced generative models with a focus on fairness and ethical data representation, this project aims to revolutionize the landscape of medical imaging. It seeks to address both the technical challenges of high-quality image generation and the ethical considerations of fair and unbiased data use in healthcare.