Third International Conference on Spoken Language Processing (ICSLP 94)
In this paper, we propose a new paradigm for robust ASR based on auditory scene analysis. In previous work, we have shown how models of auditory processing and grouping principles can be used to separate the evidence for a speech signal from arbitrary intrusions. However, this evidence will generally be incomplete since some spectro-temporal regions will be dominated by the other sources. Here, we address the problem of recognising such 'occluded' speech. Two investigations are reported: the first applies unsupervised learning and subsequent recognition to spectral vectors with missing components. The second adapts the Viterbi algorithm for HMM-based ASR to the occluded speech case. Both techniques are encouragingly robust: for instance, more than half of the observation vector can be obscured without appreciable deterioration in recognition performance. Additionally, our demonstration that it is possible to learn to recognise speech from partial information suggests a model for the formation of auditory-phonetic representations by infants in natural (i.e. cluttered) acoustic environments.
Bibliographic reference. Cooke, Martin / Green, Phil / Crawford, Malcolm (1994): "Handling missing data in speech recognition", In ICSLP-1994, 1555-1558.