Publications
Audio and ASR-based filled pause detection
Abstract
Filled pauses (or fillers) are the most common form of speech disfluencies and they can be recognized as hesitation markers (“um”, “uh” and “er”) made by speakers, usually to gain extra time while thinking their next words. Filled pauses are very frequent in spontaneous speech. Their detection is therefore rather important for two basic reasons: (a) their existence influences the performance of individual components, like Automatic Speech Recognition system (ASR), in human-machine interaction and (b) their frequency can characterize the overall speech quality of a particular speaker, as it can be strongly associated with the speaker's confidence. Despite that, only limited work has been published for the detection of filled pauses in speech, especially through audio. In this work, we propose a framework for filled pause detection using both audio and textual information. For the audio modality, we transfer knowledge …
- Date
- October 18, 2022
- Authors
- Aggelina Chatziagapi, Dimitris Sgouropoulos, Constantinos Karouzos, Thomas Melistas, Theodoros Giannakopoulos, Athanasios Katsamanis, Shrikanth Narayanan
- Conference
- 2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)
- Pages
- 1-7
- Publisher
- IEEE