5th International Conference on Spoken Language Processing
We have previously developed an adaptive speech enhancement scheme. This models speech and noise using perceptual frequency or `warped' autoregressive HMMs (AR-HMMs) and estimates the clean speech and noise parameters within this framework. In this paper, we investigate the use of our system as a front-end to a clean MFCC recognition system. We make two main modifications to our scheme. First, we use MMSE spectral rather than time domain estimators for enhancement. Second, for computational reasons, we form estimators using non-warped AR-HMMs. To avoid mismatch when converting between warped and non-warped models, we use parallel models. Results are presented for small and medium vocabulary tasks. On the simple task, we approach the performance of a matched system when language model information is included. On the second task, we are unable to incorporate a language model due to modelling deficiencies in AR-HMMs. However, we still demonstrate substantial improvements over baseline results.
Bibliographic reference. Logan, Beth / Robinson, Tony (1998): "A practical perceptual frequency autoregressive HMM enhancement system", In ICSLP-1998, paper 1083.