In this paper, the frequency domain binaural model is introduced. The proposed model is the revised one of the former time domain model which calculates the interaural crosscorrelation. The new model requires the less computational load and has the comparable performance. It is based on the FFT analysis and uses the cross-power spectrum to obtained interaural phase difference. The performance of models is examined not only in the isolated word speech recognition task and but also in the speech enhancement task. As the results of experiment, the improvement of robustness in speech recognition task corresponds to about 15-20dB when the surrounding noise is white noise. That is a few decibell better than one obtained by the time domain model. However, when the surrounding noise is speech, the improvement decreases to 10-15dB. In addition, the proposed model can reproduce the signal component from the specified direction as the binaural signal.
Cite as: Usagawa, T., Sakai, K., Ebata, M. (1998) Frequency domain binaural model as the front end of speech recognition system. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0190, doi: 10.21437/ICSLP.1998-343
@inproceedings{usagawa98_icslp, author={Tsuyoshi Usagawa and Kenji Sakai and Masanao Ebata}, title={{Frequency domain binaural model as the front end of speech recognition system}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0190}, doi={10.21437/ICSLP.1998-343} }