Sixth International Conference on Spoken Language Processing
Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compensation methods are defeated if the noise is nonstationary. To address this problem, we propose a new integration of bottom-up techniques to identify ‘coherent fragments’ of spectro-temporal energy (based on local features), with the top-down hypothesis search of conventional speech recognition, extended to search also across possible assignments of each fragment as speech or interference. Initial tests demonstrate the feasibility of this approach, and achieve a reduction in word error rate of more than 25% relative at 5 dB SNR over stationary noise missing data recognition.
Bibliographic reference. Barker, Jon / Cooke, Martin / Ellis, Daniel P. W. (2000): "Decoding speech in the presence of other sound sources", In ICSLP-2000, vol.4, 270-273.