Subband Selection for Binaural Speech Source Localization

Girija Ramesan Karthik, Prasanta Kumar Ghosh


We consider the task of speech source localization using binaural cues, namely interaural time and level difference (ITD & ILD). A typical approach is to process binaural speech using gammatone filters and calculate frame-level ITD and ILD in each subband. The ITD, ILD and their combination (ITLD) in each subband are statistically modelled using Gaussian mixture models for every direction during training. Given a binaural test-speech, the source is localized using maximum likelihood criterion assuming that the binaural cues in each subband are independent. We, in this work, investigate the robustness of each subband for localization and compare their performance against the full-band scheme with 32 gammatone filters. We propose a subband selection procedure using the training data where subbands are rank ordered based on their localization performance. Experiments on Subject 003 from the CIPIC database reveal that, for high SNRs, the ITD and ITLD of just one subband centered at 296Hz is sufficient to yield localization accuracy identical to that of the full-band scheme with a test-speech of duration 1sec. At low SNRs, in case of ITD, the selected subbands are found to perform better than the full-band scheme.


 DOI: 10.21437/Interspeech.2017-954

Cite as: Karthik, G.R., Ghosh, P.K. (2017) Subband Selection for Binaural Speech Source Localization. Proc. Interspeech 2017, 1929-1933, DOI: 10.21437/Interspeech.2017-954.


@inproceedings{Karthik2017,
  author={Girija Ramesan Karthik and Prasanta Kumar Ghosh},
  title={Subband Selection for Binaural Speech Source Localization},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1929--1933},
  doi={10.21437/Interspeech.2017-954},
  url={http://dx.doi.org/10.21437/Interspeech.2017-954}
}