5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Frequency Domain Binaural Model as the Front End of Speech Recognition System

Tsuyoshi Usagawa, Kenji Sakai, Masanao Ebata

Kumamoto University, Japan

In this paper, the frequency domain binaural model is introduced. The proposed model is the revised one of the former time domain model which calculates the interaural crosscorrelation. The new model requires the less computational load and has the comparable performance. It is based on the FFT analysis and uses the cross-power spectrum to obtained interaural phase difference. The performance of models is examined not only in the isolated word speech recognition task and but also in the speech enhancement task. As the results of experiment, the improvement of robustness in speech recognition task corresponds to about 15-20dB when the surrounding noise is white noise. That is a few decibell better than one obtained by the time domain model. However, when the surrounding noise is speech, the improvement decreases to 10-15dB. In addition, the proposed model can reproduce the signal component from the specified direction as the binaural signal.

Full Paper
Sound Example 6a   Sound Example 6b   Sound Example 6c  

Bibliographic reference.  Usagawa, Tsuyoshi / Sakai, Kenji / Ebata, Masanao (1998): "Frequency domain binaural model as the front end of speech recognition system", In ICSLP-1998, paper 0190.