In this paper a Phoneme-Dependent Multi-Environment Models based LInear feature Normalization, PD-MEMLIN, is presented. The target of this algorithm is to learn the difference between clean and noisy feature vectors associated to a pair of gaussians of the same phoneme (one for a clean model, and the other one for a noisy model), for each basic defined environment. These differences are estimated in a previous training process with stereo data. In order to compensate some of the problems of the independence assumption of the feature vectors components and the mismatch error between perfect and proposed transformations, two approaches have been proposed too: a multi-environment rotation transformation algorithm, and the use of transformed space acoustic models. Some experiments with SpeechDat Car database were carried out in order to study the behavior of the proposed techniques in a real acoustic environment. The experimental results show an average improvement of more than 77% using PD-MEMLIN, and more than 85% using transformed space acoustic models and multi-environment rotation transformation, concerning the baseline.
Cite as: Buera, L., Lleida, E., Miguel, A., Ortega, A. (2005) Robust speech recognition in cars using phoneme dependent multi-environment linear normalization. Proc. Interspeech 2005, 381-384, doi: 10.21437/Interspeech.2005-213
@inproceedings{buera05_interspeech, author={Luis Buera and Eduardo Lleida and Antonio Miguel and Alfonso Ortega}, title={{Robust speech recognition in cars using phoneme dependent multi-environment linear normalization}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={381--384}, doi={10.21437/Interspeech.2005-213} }