ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Learning binaural spectrogram features for azimuthal speaker localization

Wiktor Młynarski

Spatial localization of speech and other natural sounds with rich spectro-temporal structure is a computationally challenging task. It requires extraction of features which are informative about speaker's position and yet invariant to sound level and spectral modulation present in the signal. This paper demonstrates that this can be achieved with Independent Component Analysis (ICA) applied to binaural speech spectrograms. A small subset of learned Independent Components (ICs) captures signal structure imposed by outer ears. A Gaussian Classifier trained on those features, performs accurate localization on the azimuthal plane. The remaining majority of ICs have position invariant distributions, and can be used to reconstruct the spectrogram of the original sound source.


doi: 10.21437/Interspeech.2013-656

Cite as: Młynarski, W. (2013) Learning binaural spectrogram features for azimuthal speaker localization. Proc. Interspeech 2013, 2939-2942, doi: 10.21437/Interspeech.2013-656

@inproceedings{mynarski13_interspeech,
  author={Wiktor Młynarski},
  title={{Learning binaural spectrogram features for azimuthal speaker localization}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={2939--2942},
  doi={10.21437/Interspeech.2013-656}
}