In this work, we demonstrate that the most widely-used model for the relationship between noisy speech, clean speech and noise in the log-Mel domain is inaccurate due to its disregard of the phase. Moreover, we show how a more exact model can be derived by averaging over the phase in the log-Mel domain, and how this can profitably be applied to particle filter based sequential noise compensation. Experimental results confirm the superiority of the phase-averaged model for both clean speech estimation in general and the particle filter in particular. Reductions in word error rate of up to 17% relative were obtained on a large vocabulary task.
Bibliographic reference. Faubel, Friedrich / McDonough, John / Klakow, Dietrich (2008): "A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain", In INTERSPEECH-2008, 553-556.