The performance of speech recognition systems is adversely affected by mismatch in training and testing environmental conditions. In addition to test data from noisy environments, there are scenarios where the training data itself is noisy. Speech enhancement techniques which solely focus on finding a clean speech estimate from the noisy signal are not effective here. Model adaptation techniques may also be ineffective due to the dynamic nature of the environment. In this paper, we propose a method for mismatch compensation between training and testing environments using the "average eigenspace" approach when the mismatch is non-stationary. There is no need for explicit adaptation data as the method works on incoming test data to find the compensatory transform. This method is different from traditional signal-noise subspace filtering techniques where the dimensionality of the clean signal space is assumed to be less than the noise space and noise affects all dimensions to the same extent. We evaluate this approach on two corpora which are collected from real car environments: CU-Move and UTDrive. Using Sphinx, a relative reduction of 40.50% is achieved in WER compared to the baseline system. The method also results in a reduction in the dimensionality of the feature vectors allowing for a more compact set of acoustic models in the phoneme space.
Bibliographic reference. Kumar, Abhishek / Hansen, John H. L. (2008): "Environment mismatch compensation using average eigenspace for speech recognition", In INTERSPEECH-2008, 1277-1280.