Social Complexity and Fairness in Synthetic Medical Data
Title: Social Complexity and Fairness in Synthetic Medical Data
DNr: Berzelius-2024-371
Project Type: LiU Berzelius
Principal Investigator: Saghi Hajisharif <saghi.hajisharif@liu.se>
Affiliation: Linköpings universitet
Duration: 2024-10-28 – 2025-05-01
Classification: 20603
Homepage: https://liu.se/en/employee/erijo72
Keywords:

Abstract

Machine learning, and especially deep learning, has made significant progress over the last decade. However, its requirement for large amounts of data often limits its potential, particularly in the image domain and fields like medical imaging where data can be rare, costly, and sensitive. To address this issue, the creation of synthetic data through advanced techniques such as Generative Adversarial Networks (GANs), and diffusion models, has become an important solution. These methods can increase the volume and diversity of training data, especially useful in medical imaging. Recent advancements in these technologies have enabled the production of highly realistic images, which are suitable for a wide range of domains. Diffusion models have shown to be very effective at generating diverse, high-quality images, which is an invaluable feature for creating varied medical datasets. However, the analysis of fairness and the potential risk of bias in the generated image remains an under-explored area. Additionally, the effect of hallucinations caused by these models can be problematic and the scope of the issue that can arise from this has not been fully explored for the image domain. In the second part of our project, we explore the bias and fairness in vision transformer models, like Dinov2, and diffusion models, such as Stable Diffusion, applied to general image datasets for assessing both individual and group fairness in medical imaging with demographic data. We aim to enhance fairness and reduce the bias of these models by training them with fairness constraints.