Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Decoding Speech in the Presence of Other Sound Sources

Jon Barker (1), Martin Cooke (1), Daniel P. W. Ellis (2)

(1) Department of Computer Science, University of Sheffield, UK
(2) International Computer Science Insitute, Berkeley, CA, USA

Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compensation methods are defeated if the noise is nonstationary. To address this problem, we propose a new integration of bottom-up techniques to identify ‘coherent fragments’ of spectro-temporal energy (based on local features), with the top-down hypothesis search of conventional speech recognition, extended to search also across possible assignments of each fragment as speech or interference. Initial tests demonstrate the feasibility of this approach, and achieve a reduction in word error rate of more than 25% relative at 5 dB SNR over stationary noise missing data recognition.

Full Paper

Bibliographic reference.  Barker, Jon / Cooke, Martin / Ellis, Daniel P. W. (2000): "Decoding speech in the presence of other sound sources", In ICSLP-2000, vol.4, 270-273.