Addressing Dysarthric Speech Variability with Severity-Aware Fine-Tuning of Transformer-Based Wav2vec2 ASR Models: P. Sapkota, HK Kathania, S. Narayanan, SR Kadiri

Abstract

Building a robust ASR system for dysarthric speech is challenging due to its irregular articulation, phonation, and prosody, which differ markedly from normal speech. The advent of transformer-based end-to-end systems has notably improved ASR performance for normal speech recognition tasks. In particular, adapting pre-trained transformer-based ASR models trained on large amounts of similar data exhibits superior performance compared to models trained from scratch. However, the efficacy of such pre-trained models is limited to normal speech, whereas their performance in dysarthric speech recognition remains uncertain. This paper explores the extension of these advances to dysarthric speech recognition. We investigate fine-tuning the Wav2vec2 pretrained model on dysarthric speech data, with a focus on two factors: severity level (Low, Medium, High, or Very High) and speaker independence. The initial …

Date: 2026
Authors: Paban Sapkota, Hemant Kumar Kathania, Shrikanth Narayanan, Sudarsana Reddy Kadiri
Journal: Circuits, Systems, and Signal Processing
Pages: 1-41
Publisher: Springer US

View Paper

Information Sciences Institute

Publications

Addressing Dysarthric Speech Variability with Severity-Aware Fine-Tuning of Transformer-Based Wav2vec2 ASR Models: P. Sapkota, HK Kathania, S. Narayanan, SR Kadiri

Abstract