Publications

Automatic classification of vocal intensity categories from amplitude-normalized speech signals by comparing acoustic features and classifier models

Abstract

Regulation of vocal intensity is a fundamental phenomenon in speech communication. Speakers use different intensity categories (eg, soft, normal, and loud voice) to generate different vocal emotions or to communicate in noisy conditions or over varying distances. Vocal intensity categories have been studied in fundamental research of speech, but much less is known about their automatic classification. This study investigates the classification of vocal intensity categories from speech signals in a scenario, where the original level information of speech is absent and the signal is presented on a normalized amplitude scale. Different acoustic features were studied together with machine learning (ML) and deep learning (DL) classifiers using two different labeling approaches. Speech signals recorded from 50 speakers reciting sentences in four intensity categories (soft, normal, loud, and very loud) were analyzed. Altogether 15 feature sets including different cepstral, spectral and handcrafted (eGeMAPS) features were compared. Three ML classifiers (support vector machine, random forest and AdaBoost), and four DL classifiers (deep neural network, convolutional neural network, recurrent neural network and bidirectional long short-term memory network) were compared. The best classification accuracy of 86.0% was obtained by combining the best performing cepstral and spectral features and using the bidirectional long short-term memory classifier.

Date
November 30, 2025
Authors
Manila Kodali, Luna Ansari, Sudarsana Kadiri, Shrikanth Narayanan, Paavo Alku
Journal
Speech Communication
Publisher
Elsevier