EUROSPEECH 2003 - INTERSPEECH 2003
This paper reports our recent efforts to develop a unified, non-linear, stochastic model for estimating and removing the effects of additive noise on speech cepstra. The complete system consists of prior models for speech and noise, an observation model, and an inference algorithm. The observation model quantifies the relationship between clean speech, noise, and the noisy observation. Since it is expressed in terms of the log Mel-frequency filter-bank features, it is non-linear. The inference algorithm is the procedure by which the clean speech and noise are estimated from the noisy observation. The most critical component of the system is the observation model. This paper derives a new approximation strategy and compares it with two existing approximations. It is shown that the new approximation uses half the calculation, and produces equivalent or improved word accuracy scores, when compared to previous techniques. We present noise-robust recognition results on the standard Aurora 2 task.
Bibliographic reference. Droppo, Jasha / Deng, Li / Acero, Alex (2003): "A comparison of three non-linear observation models for noisy speech features", In EUROSPEECH-2003, 681-684.