5th International Conference on Spoken Language Processing
In this paper, the frequency domain binaural model is introduced. The proposed model is the revised one of the former time domain model which calculates the interaural crosscorrelation. The new model requires the less computational load and has the comparable performance. It is based on the FFT analysis and uses the cross-power spectrum to obtained interaural phase difference. The performance of models is examined not only in the isolated word speech recognition task and but also in the speech enhancement task. As the results of experiment, the improvement of robustness in speech recognition task corresponds to about 15-20dB when the surrounding noise is white noise. That is a few decibell better than one obtained by the time domain model. However, when the surrounding noise is speech, the improvement decreases to 10-15dB. In addition, the proposed model can reproduce the signal component from the specified direction as the binaural signal.
Sound Example 6a Sound Example 6b Sound Example 6c
Bibliographic reference. Usagawa, Tsuyoshi / Sakai, Kenji / Ebata, Masanao (1998): "Frequency domain binaural model as the front end of speech recognition system", In ICSLP-1998, paper 0190.