Sixth European Conference on Speech Communication and Technology
The performance of most ASR systems degrades rapidly with data mismatch relative to the data used in training. Under many realistic noise conditions a significant proportion of the spectral representation of a speech signal, which is highly redundant, remains uncorrupted. In the "missing feature" approach to this problem mismatching data is simply ignored, but the need to base recognition on unorthogonalised spectral features results in reduced performance in clean speech. In multiband ASR the results from independent recognition on a number of within-band orthogonalised sub-bands are combined. This approach more accurately reflects the uncertainty in mismatch detection, but loss of joint information due to independent sub-band processing can also result in reduced performance with clean speech. In this article the "full combination" approach to noise robust ASR is presented in which multiple data streams are associated not with individual sub-bands but with sub-band combinations. In this way no assumption of sub-band independence is required. Initial tests show some improved robustness to noise with no significant loss of performance with clean speech. orthogonalisation prior to recognition, which results in unacceptably low performance in clean speech.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Morris, Andrew / Hagen, Astrid / Bourlard, Hervé (1999): "The full combination sub-bands approach to noise robust HMM/ANN based ASR", In EUROSPEECH'99, 599-602.