8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Robust Speech Recognition Using Non-Linear Spectral Smoothing

Michael J. Carey

University of Bristol, U.K.

A new simple but robust method of front-end analysis, nonlinear spectral smoothing (NLSS), is proposed. NLSS uses rank-order filtering to replace noisy low-level speech spectrum coefficients with values computed from adjacent spectral peaks. The resulting transformation bears significant similarities with masking in the auditory system. It can be used as an intermediate processing stage between the FFT and the filter-bank analyzer. It also produces features which can be cosine transformed and used by a pattern matcher. NLSS gives significant improvements in the performance of speech recognition systems in the presence of stationary noise, a reduction in error rate of typically 50% or an increased tolerance to noise of 3dB for the same error rate in an isolated digit test on the Noisex database. Results on female speech were superior to those on male speech: female speech gave a recognition error rate of 1.1% at a 0dB signal to noise ratio.

Full Paper

Bibliographic reference.  Carey, Michael J. (2003): "Robust speech recognition using non-linear spectral smoothing", In EUROSPEECH-2003, 3045-3048.