Speaker adaptation for support vector machine-based word prominence detection

Andrea Schnall, Martin Heckmann


In this paper we propose a new speaker adaptation method to improve the detection of prominent words in speech. Prosodic cues are difficult to extract, due to the different features different speakers are using to express, for example prominence. To overcome the problem of variation from the pool of speakers used during training and those encountered during deployment, in speech recognition speaker adaptation techniques like fMLLR turned out to be very useful. In the case of prominence detection, our results have shown that a discriminative classifier like SVM works better than GMM. Existing adaptation methods like fMLLR are developed for GMM-HMM based classifiers under the assumption that the data has a Gaussian distribution. This does not hold for our data, using the fMLLR with the SVM leads not to an improvement for our problem area. Therefore we propose a new adaptation method, which adapts the data to the RBF kernel of the SVM, subsequently regularizing it with the fMLLR. We investigate how this method can be used to adapt a new speaker to a speaker independent model for word prominence detection. We show that the error rate improves from the speaker adaptation from 16.4% to 14.4%.


DOI: 10.21437/SpeechProsody.2016-56

Cite as

Schnall, A., Heckmann, M. (2016) Speaker adaptation for support vector machine-based word prominence detection. Proc. Speech Prosody 2016, 272-276.

Bibtex
@inproceedings{Schnall+2016,
author={Andrea Schnall and Martin Heckmann},
title={Speaker adaptation for support vector machine-based word prominence detection},
year=2016,
booktitle={Speech Prosody 2016},
doi={10.21437/SpeechProsody.2016-56},
url={http://dx.doi.org/10.21437/SpeechProsody.2016-56},
pages={272--276}
}