For robust distant-talking speech recognition, a novel HMM training approach using data pairs is proposed. The data pairs of clean and reverberant feature vectors, also called stereo data, are used for deriving the HMM parameters of a matched-condition reverberant HMM from a well-trained clean-speech HMM in two steps. In the first step, the alignment of the frames to the states is determined from the clean data and the clean-speech HMM. This state-frame alignment (SFA) is then used in the second step to estimate the Gaussian mixture densities for each state of the reverberant HMM by applying the Expectation Maximization (EM) algorithm to the reverberant data. Thus, a more accurate temporal alignment is achieved than by standard matched condition training, and the discrimination capability of the HMMs is increased. Connected digit recognition experiments show that the proposed approach decreases the word error rate (WER) by up to 44% while substantially reducing the training complexity.
Bibliographic reference. Sehr, Armin / Hofmann, Christian / Maas, Roland / Kellermann, Walter (2010): "A novel approach for matched reverberant training of HMMs using data pairs", In INTERSPEECH-2010, 566-569.