Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source

Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li


We study the estimation of time difference of arrival (TDOA) under noisy and reverberant conditions. Conventional TDOA estimation methods such as MUltiple SIgnal Classification (MUSIC) are not robust to noise and reverberation due to the distortion in the spatial covariance matrix (SCM). To address this issue, this paper proposes a robust SCM estimation method, called weighted SCM (WSCM). In the WSCM estimation, each time-frequency (TF) bin of the input signal is weighted by a TF mask which is 0 for non-speech TF bins and 1 for speech TF bins in ideal case. In practice, the TF mask takes values between 0 and 1 that are predicted by a long short term memory (LSTM) network trained from a large amount of simulated noisy and reverberant data. The use of mask weights significantly reduces the contribution of low SNR TF bins to the SCM estimation, hence improves the robustness of MUSIC. Experimental results on both simulated and real data show that we have significantly improved the robustness of MUSIC by using the weighted SCM.


 DOI: 10.21437/Interspeech.2017-199

Cite as: Xu, C., Xiao, X., Sun, S., Rao, W., Chng, E.S., Li, H. (2017) Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source. Proc. Interspeech 2017, 1894-1898, DOI: 10.21437/Interspeech.2017-199.


@inproceedings{Xu2017,
  author={Chenglin Xu and Xiong Xiao and Sining Sun and Wei Rao and Eng Siong Chng and Haizhou Li},
  title={Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1894--1898},
  doi={10.21437/Interspeech.2017-199},
  url={http://dx.doi.org/10.21437/Interspeech.2017-199}
}