8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

On the Jointly Unsupervised Feature Vector Normalization and Acoustic Model Compensation for Robust Speech Recognition

Luis Buera, Antonio Miguel, Eduardo Lleida, Óscar Saz, Alfonso Ortega

University of Zaragoza, Spain

To compensate the mismatch between training and testing conditions, an unsupervised hybrid compensation technique is proposed. It combines Multi-Environment Model based LInear Normalization (MEMLIN) with a novel acoustic model adaptation method based on rotation transformations. A set of rotation transformations is estimated between clean and MEMLIN-normalized data by linear regression in a training process. Thus, each MEMLIN-normalized frame is decoded using the expanded acoustic models, which are obtained from the reference ones and the set of rotation transformations. During the search algorithm, one of the rotation transformations is on-line selected for each frame according to the ML criterion in a modified Viterbi algorithm. Some experiments with Spanish SpeechDat Car database were carried out. MEMLIN over standard ETSI front-end parameters reaches 75.53% of mean improvement in WER, while the introduced hybrid solution goes up to 90.54%.

Full Paper

Bibliographic reference.  Buera, Luis / Miguel, Antonio / Lleida, Eduardo / Saz, Óscar / Ortega, Alfonso (2007): "On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition", In INTERSPEECH-2007, 1046-1049.