ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

Handling missing data in speech recognition

Martin Cooke, Phil Green, Malcolm Crawford

In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectro-temporal regions will be dominated by the other sources. Here, we address the problem of recognising such 'occluded' speech. Two investigations are reported: the first applies unsupervised learning and subsequent recognition to spectral vectors with missing components. The second adapts the Viterbi algorithm for HMM-based ASR to the occluded speech case. Both techniques are encouragingly robust: for instance, more than half of the observation vector can be obscured without appreciable deterioration in recognition performance. Additionally, our demonstration that it is possible to learn to recognise speech from partial information suggests a model for the formation of auditory-phonetic representations by infants in natural (i.e. cluttered) acoustic environments.

Cite as: Cooke, M., Green, P., Crawford, M. (1994) Handling missing data in speech recognition. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 1555-1558

  author={Martin Cooke and Phil Green and Malcolm Crawford},
  title={{Handling missing data in speech recognition}},
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},