A sonorant detection scheme using Mel-frequency cepstral coefficients and support vector machines (SVMs) is presented and tested in a variety of noise conditions. Adapting the classifier threshold using an estimate of the noise level is used to bias the classifier to effectively compensate for mismatched training and testing conditions. The adaptive threshold classifier achieves low frame error rates using only clean training data without requiring specifically designed features or learning algorithms. The frame-by-frame SVM output is analyzed over longer time periods to uncover temporal modulations related to syllable structure which may aid in landmark-based speech recognition and speech detection. Appropriate filtering of this signal leads to a representation which is stable over a wide range of noise conditions. Using the smoothed output for landmark detection results in a high precision rate, enabling confident pruning of the search-space used by landmark-based speech recognizers.
Cite as: Schutte, K., Glass, J. (2005) Robust detection of sonorant landmarks. Proc. Interspeech 2005, 1005-1008, doi: 10.21437/Interspeech.2005-240
@inproceedings{schutte05_interspeech, author={Ken Schutte and James Glass}, title={{Robust detection of sonorant landmarks}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1005--1008}, doi={10.21437/Interspeech.2005-240} }