Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005)

ITRW and COST278 Final Workshop
Aalborg, Denmark
November 10-11, 2005

Combined Spectral Subtraction and Cepstral Normalisation for Robust Speech Recognition

Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg

SMC-Speech and Multimedia Communication, Department of Communication Technology, Aalborg University, Denmark

This paper presents an effective feature processing algorithm for robust speech recognition, based on combined spectral and cepstral processing. The spectral processing consists of Full-Wave Rectification Spectral Subtraction (FWR-SS) and Likelihood Controlled Instantaneous Noise Estimation (LCINE) while the cepstral processing is based on mean-and variance normalisation.

The combination is motivated by the fact that the (usually) one frame based spectral subtraction introduces large statistical mismatches between clean and enhanced noisy speech in the cepstral domain, resulting in a degradation of the recognition performance. The introduced cepstralprocessing is able, to some extent, to mitigate these mismatches and in this sense the two methods are not just combined but shown to be complementary. Statistical analyses as well as recognition experiments are conducted on the Aurora 2 database and a performance comparable to the much more complex ETSI advanced front-end is achieved.

Full Paper

Bibliographic reference.  Xu, Haitian / Tan, Zheng-Hua / Dalsgaard, Paul / Lindberg, Børge (2005): "Combined spectral subtraction and cepstral normalisation for robust speech recognition", In ASIDE-2005, paper 30.