Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

Encoded Speech Recognition Accuracy Improvement in adverse Environments by enhancing Formant Spectral bands

Shubha Kadambe, Ron Burns

HRL Laboratories, LLC, Malibu, CA, USA

Spoken dialogue information retrieval applications are the future trend for mobile users in automobiles, on cellular phones, etc. Due to the limitation of resources in these platforms, it may be advantageous to extract speech features, and compress and transmit them to a central hub where the computation intensive tasks such as speech recognition and speech understanding, etc. can be performed. Generally, the speech recognition accuracy degrades when the decoded speech signal (that is obtained after re-synthesizing the signal from the compressed features) is used. In addition, the background noise that is present in the above mentioned mobile systems will reduce the recognition accuracy. Therefore, in order to improve the recognition accuracy it is essential to extract robust features that can jointly optimize compression and recognition. In this paper, we describe a technique that improves the recognition accuracy of noisy encoded speech signals by performing spectral correction and spectral formant band enhancement before synthesizing the speech signal from the compressed features. We have conducted experiments on 1831 telephone speech utterances from 1831 speakers. We added (a) the invehicle noise recorded from a Volvo car moving on an asphalt road at 134 kmph, (b) the factory noise recorded in a factory and (c) the speech (babble) noise recorded in a cafeteria to these utterances at various signal-to-noise ratios (SNR). Our experimental results indicate recognition accuracy improvement up to 10% at 0 dB SNR.


Full Paper

Bibliographic reference.  Kadambe, Shubha / Burns, Ron (2000): "Encoded speech recognition accuracy improvement in adverse environments by enhancing formant spectral bands", In ICSLP-2000, 365-368.