Reducing Bias and Stereotypes in the Output of Large Language Models
Title: Reducing Bias and Stereotypes in the Output of Large Language Models
DNr: Berzelius-2023-144
Project Type: LiU Berzelius
Principal Investigator: Shirin Tahmasebinotarki <shirint@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2023-05-24 – 2023-12-01
Classification: 10201
Homepage: https://shirintahmasebi.github.io/
Keywords:

Abstract

Large Language Models (LLM) have recently attracted considerable attention in various domains. However, a critical concern associated with these models is the potential for bias and stereotypes in their generated output. Bias can manifest in different forms, such as racial, gender, age-related, and other biases, perpetuating inequality and reinforcing societal prejudices. Since these LLMs are being used in different domains and in large scales, such biases can cause many ethical problems. This project aims to address the pressing issue of bias reduction in LLMs and develop effective solutions for making the output fair and unbiased. This research aims to investigate the presence of bias and stereotypes in the output of LLMs and propose several techniques to mitigate their influence. By acknowledging and understanding the potential harms caused by biased language models, we can take significant steps towards building more inclusive and equitable language models. The primary objectives of this research proposal are as follows: 1. Conduct an extensive analysis of LLMs to identify, quantify, and measure biases related to race, gender, age, and other potential sources of prejudice. 2. Explore state-of-the-art techniques and approaches employed in bias detection and mitigation. These techniques can be focused on mitigating biases in different phases, including debiasing training datasets, pre-training strategies, and fine-tuning techniques. 3. Design and implement novel methods to reduce biases and stereotypes in LLMs, aiming for increased fairness and equity. 4. Evaluate the effectiveness of the proposed solutions considering various benchmark datasets and evaluation metrics.