The principles of human sound localization imply binaural (interaural level and time difference) as well as monaural cues. The latter are captured by the head-related transfer functions (HRTFs), which describe the direction-dependent, spectral shaping of the incident sound wave, and can be exploited to determine the direction. In this paper an accurate talker localization strategy in the horizontal plane using the signal of only one microphone is presented. The sound localization method is developed based on a set of HRTF measurements taken from a dummy head and a statistical model of speech. High-dimensional spectral features (STFT coefficients) are taken and the direction of the sound source is evaluated with Gaussian mixture models (GMMs) using a maximum likelihood (ML) framework. An evaluation of the developed method in a synthetic test environment yields excellent localization results and leads to a promising approach which can be further investigated in future research.
Bibliographic reference. Fuchs, Anna Katharina / Feldbauer, Christian / Stark, Michael (2011): "Monaural sound localization", In INTERSPEECH-2011, 2521-2524.