This paper presents an effective feature processing algorithm for robust speech recognition, based on combined spectral and cepstral processing. The spectral processing consists of Full-Wave Rectification Spectral Subtraction (FWR-SS) and Likelihood Controlled Instantaneous Noise Estimation (LCINE) while the cepstral processing is based on mean-and variance normalisation.
The combination is motivated by the fact that the (usually) one frame based spectral subtraction introduces large statistical mismatches between clean and enhanced noisy speech in the cepstral domain, resulting in a degradation of the recognition performance. The introduced cepstralprocessing is able, to some extent, to mitigate these mismatches and in this sense the two methods are not just combined but shown to be complementary. Statistical analyses as well as recognition experiments are conducted on the Aurora 2 database and a performance comparable to the much more complex ETSI advanced front-end is achieved.
Cite as: Xu, H., Tan, Z.-H., Dalsgaard, P., Lindberg, B. (2005) Combined spectral subtraction and cepstral normalisation for robust speech recognition. Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005), paper 30
@inproceedings{xu05_aside, author={Haitian Xu and Zheng-Hua Tan and Paul Dalsgaard and Børge Lindberg}, title={{Combined spectral subtraction and cepstral normalisation for robust speech recognition}}, year=2005, booktitle={Proc. Applied Spoken Language Interaction in Distributed Environments (ASIDE 2005)}, pages={paper 30} }