Publications

Audio and ASR-based filled pause detection

Abstract

Filled pauses (or fillers) are the most common form of speech disfluencies and they can be recognized as hesitation markers (“um”, “uh” and “er”) made by speakers, usually to gain extra time while thinking their next words. Filled pauses are very frequent in spontaneous speech. Their detection is therefore rather important for two basic reasons: (a) their existence influences the performance of individual components, like Automatic Speech Recognition system (ASR), in human-machine interaction and (b) their frequency can characterize the overall speech quality of a particular speaker, as it can be strongly associated with the speaker's confidence. Despite that, only limited work has been published for the detection of filled pauses in speech, especially through audio. In this work, we propose a framework for filled pause detection using both audio and textual information. For the audio modality, we transfer knowledge …

Date
October 18, 2022
Authors
Aggelina Chatziagapi, Dimitris Sgouropoulos, Constantinos Karouzos, Thomas Melistas, Theodoros Giannakopoulos, Athanasios Katsamanis, Shrikanth Narayanan
Conference
2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)
Pages
1-7
Publisher
IEEE