Stereo-based stochastic mapping (SSM) is a technique based on constructing a Gaussian mixture model for the joint distribution of stereo data. This paper considers the use of SSM for noise robust speech recognition, in which clean and noisy speech features form the stereo data. The Gaussian mixture model, whose parameters are estimated from the observed stereo features during training time, is then used in test time to predict the clean speech from its noisy observation. This paper proposes to leverage the noisy speech observation for updating the model parameters during test time, and thus improve the prediction of the clean speech from its noisy observation. Specifically, an expectation-maximization procedure is developed for adapting the model parameters during test time. This adaptation is especially important when there is a mismatch between the training and testing sets, or when the size of the training set is relatively small, resulting in a poor estimation of the parameters. The proposed method is tested on a noise robustness task and is shown to improve the performance achieved by SSM.
Bibliographic reference. Maymon, Shay / Dognin, Pierre / Cui, Xiaodong / Goel, Vaibhava (2013): "Adaptive stereo-based stochastic mapping", In INTERSPEECH-2013, 3517-3521.