14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Modified Cepstral Mean Normalization — Transforming to Utterance Specific Non-Zero Mean

Vikas Joshi, N. Vishnu Prasad, S. Umesh

IIT Madras, India

Cepstral Mean Normalization (CMN) is a widely used technique for channel compensation and for noise robustness. CMN compensates for noise by transforming both train and test utterances to zero mean, thus matching first-order moment of train and test conditions. Since all utterances are normalized to zero mean, CMN could lead to loss of discriminative speech information, especially for short utterances. In this paper, we modify CMN to reduce this loss by transforming every noisy test utterance to the estimate of clean utterance mean (mean estimate of the given utterance if noise was not present) and not to zero mean. A look-up table based approach is proposed to estimate the clean-mean of the noisy utterance. The proposed method is particularly relevant for IVR-based applications, where the utterances are usually short and noisy. In such cases, techniques like Histogram Equalization (HEQ) do not perform well and a simple approach like CMN leads to loss of discrimination. We obtain a 12% relative improvement over CMN in WER for Aurora-2 database; and when we analyze only short utterances, we obtain a relative improvement of 5% and 25% in WER over CMN and HEQ respectively.

Full Paper

Bibliographic reference.  Joshi, Vikas / Prasad, N. Vishnu / Umesh, S. (2013): "Modified cepstral mean normalization — transforming to utterance specific non-zero mean", In INTERSPEECH-2013, 881-885.