This paper presents a feature normalization technique based on minimum mean square error, histogram normalization and multi-environment models. Using stereo training data, accurate estimates of the bias between clean and distorted speech cepstral vectors can be provided. With the stereo training data, a non-linear transformation of the distorted cepstral vectors is performed based on minimum mean square error estimation and histogram equalization. Results with SpeechDat Car database show an improvement in the word error rate with regard to linear transformation techniques as SPLICE and MEMLIN. An improvement in word error rate of 67.28% in digits task, and 40.79% in spelling task are obtained.
Cite as: Buera, L., Lleida, E., Miguel, A., Ortega, A. (2004) Multi-environment models based linear normalization for robust speech recognition. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 174-180
@inproceedings{buera04_specom, author={Luis Buera and Eduardo Lleida and Antonio Miguel and Alfonso Ortega}, title={{Multi-environment models based linear normalization for robust speech recognition}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={174--180} }