EUROSPEECH 2001 Scandinavia
In this paper, a feature transformation method is presented for distant speech recognition in reverberant and noisy environments. In the Maximum Likelihood framework the optimum bias parameters are obtained on-line, using a small number of successive speech frames. The stochastic matching is achieved by assuming a mixture of Gaussians pdf for the clean speech features. The proposed method was evaluated on the Mel-scaled Frequency Cepstral Coefficient (MFCC) features as well as on MFCC after cepstral mean subtraction and after RASTA filtering. The experiments, carried out in several adverse conditions including room acoustics and additive factory noise for stationary and moving speakers, have shown significant improvement of the recognition accuracy for isolated word speech recognition. In the experiments, the proposed method improves the recognition score of a standing speaker by more than 50%, when SNR is higher than 10db. In the case of the moving speaker the improvement is 8.6% using MFCC while the score reach 91.05% using RASTA fetures.
Bibliographic reference. Nokas, George / Dermatas, Evangelos / Kokkinakis, George (2001): "Maximum likelihood adaptation for distant speech recognition of stationary and moving speakers in reverberant environments", In EUROSPEECH-2001, 2631-2634.