Speech foundation model bias and finetuning
| Title: |
Speech foundation model bias and finetuning |
| DNr: |
Berzelius-2025-377 |
| Project Type: |
LiU Berzelius |
| Principal Investigator: |
Éva Székely <szekely@kth.se> |
| Affiliation: |
Kungliga Tekniska högskolan |
| Duration: |
2025-11-01 – 2026-05-01 |
| Classification: |
10208 |
| Keywords: |
|
Abstract
Speech/Audio Foundation Models (SFMs/AFMs) claim to overcome the limitations of the conventional ASR - LLM - TTS methodology of interacting with LLMs through voice. They can supposedly understand, respond to, and utilize prosodic cues: paralinguistic (emotion, feelings, attitudes etc.) and extralinguistic (speaker identity, demographic) information when generating a response to the input prompt. This information is lost or muddled during the transition phases in the conventional methodology.
In the next part of this work, we will be trying to build newer SFMs or similar ideas and see what we can do to mitigate biases present in their responses. The project and work is expected to show the relevance of continuing to examine fair and inclusive methodologies for training audio foundation models and also in conversational AI.