Classification of Voice Modality Using Electroglottogram Waveforms

Michal Borsky, Daryush D. Mehta, Julius P. Gudjohnsen, Jon Gudnason


It has been proven that the improper function of the vocal folds can result in perceptually distorted speech that is typically identified with various speech pathologies or even some neurological diseases. As a consequence, researchers have focused on finding quantitative voice characteristics to objectively assess and automatically detect non-modal voice types. The bulk of the research has focused on classifying the speech modality by using the features extracted from the speech signal. This paper proposes a different approach that focuses on analyzing the signal characteristics of the electroglottogram (EGG) waveform. The core idea is that modal and different kinds of non-modal voice types produce EGG signals that have distinct spectral/cepstral characteristics. As a consequence, they can be distinguished from each other by using standard cepstral-based features and a simple multivariate Gaussian mixture model. The practical usability of this approach has been verified in the task of classifying among modal, breathy, rough, pressed and soft voice types. We have achieved 83% frame-level accuracy and 91% utterance-level accuracy by training a speaker-dependent system.


DOI: 10.21437/Interspeech.2016-1194

Cite as

Borsky, M., Mehta, D.D., Gudjohnsen, J.P., Gudnason, J. (2016) Classification of Voice Modality Using Electroglottogram Waveforms. Proc. Interspeech 2016, 3166-3170.

Bibtex
@inproceedings{Borsky+2016,
author={Michal Borsky and Daryush D. Mehta and Julius P. Gudjohnsen and Jon Gudnason},
title={Classification of Voice Modality Using Electroglottogram Waveforms},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1194},
url={http://dx.doi.org/10.21437/Interspeech.2016-1194},
pages={3166--3170}
}