ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Decoding speech in the presence of other sound sources

Jon Barker, Martin Cooke, Daniel P. W. Ellis

Conventional speech recognition is notoriously vulnerable to additive noise, and even the best compensation methods are defeated if the noise is nonstationary. To address this problem, we propose a new integration of bottom-up techniques to identify ‘coherent fragments’ of spectro-temporal energy (based on local features), with the top-down hypothesis search of conventional speech recognition, extended to search also across possible assignments of each fragment as speech or interference. Initial tests demonstrate the feasibility of this approach, and achieve a reduction in word error rate of more than 25% relative at 5 dB SNR over stationary noise missing data recognition.


Cite as: Barker, J., Cooke, M., Ellis, D.P.W. (2000) Decoding speech in the presence of other sound sources. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 270-273

@inproceedings{barker00b_icslp,
  author={Jon Barker and Martin Cooke and Daniel P. W. Ellis},
  title={{Decoding speech in the presence of other sound sources}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 270-273}
}