Speech foundation model bias and finetuning
Title: Speech foundation model bias and finetuning
DNr: Berzelius-2025-377
Project Type: LiU Berzelius
Principal Investigator: Éva Székely <szekely@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-11-01 – 2026-05-01
Classification: 10208
Keywords:

Abstract

Speech/Audio Foundation Models (SFMs/AFMs) claim to overcome the limitations of the conventional ASR - LLM - TTS methodology of interacting with LLMs through voice. They can supposedly understand, respond to, and utilize prosodic cues: paralinguistic (emotion, feelings, attitudes etc.) and extralinguistic (speaker identity, demographic) information when generating a response to the input prompt. This information is lost or muddled during the transition phases in the conventional methodology. In the next part of this work, we will be trying to build newer SFMs or similar ideas and see what we can do to mitigate biases present in their responses. The project and work is expected to show the relevance of continuing to examine fair and inclusive methodologies for training audio foundation models and also in conversational AI.