Prosodic similarity representation
Title: Prosodic similarity representation
DNr: Berzelius-2025-105
Project Type: LiU Berzelius
Principal Investigator: Livia Qian <liviaq@kth.se>
Affiliation: Kungliga Tekniska högskolan
Duration: 2025-03-19 – 2025-10-01
Classification: 10208
Homepage: https://www.cse.chalmers.se/~richajo/projects/wasp2022.html
Keywords:

Abstract

Vocal feedback (e.g., `mhm', `yeah', `okay') is an important component of spoken dialogue and is crucial to ensuring common ground in conversational systems. The exact meaning of such feedback is conveyed through both lexical and prosodic form. In this work, we investigate the perceived prosodic similarity of vocal feedback with the same lexical form, and to what extent existing speech representations reflect such similarities. A triadic comparison task with recruited participants is used to measure perceived similarity of feedback responses taken from two different datasets. We find that spectral and self-supervised speech representations encode prosody better than extracted pitch features, especially in the case of feedback from the same speaker. We also find that it is possible to further condense and align the representations to human perception through contrastive learning.