10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Variational Model Composition for Robust Speech Recognition with Time-Varying Background Noise

Wooil Kim, John H. L. Hansen

University of Texas at Dallas, USA

This paper proposes a novel model composition method to improve speech recognition performance in time-varying background noise conditions. It is suggested that each order of the cepstral coefficients represents the frequency degree of changing components in the envelope of the log-spectrum. With this motivation, in the proposed method, variational noise models are generated by selectively applying perturbation factors to a basis model, resulting in a collection of various types of spectral patterns in the log-spectral domain. The basis noise model is obtained from the silent duration segments of the input speech. The proposed Variational Model Composition (VMC) method is employed to generate multiple environmental models for our previously proposed feature compensation method. Experimental results prove that the proposed method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions with 30.34% and 9.02% average relative improvements in word error rate for speech babble and background music conditions respectively, compared to an existing single model-based method.

Full Paper

Bibliographic reference.  Kim, Wooil / Hansen, John H. L. (2009): "Variational model composition for robust speech recognition with time-varying background noise", In INTERSPEECH-2009, 2399-2402.