Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks

Zhong-Qiu Wang, Xueliang Zhang, DeLiang Wang


Deep learning based time-frequency (T-F) masking has dramatically advanced monaural speech separation and enhancement. This study investigates its potential for robust time difference of arrival (TDOA) estimation in noisy and reverberant environments. Three novel algorithms are proposed to improve the robustness of conventional cross-correlation-, beamforming- and subspace-based algorithms for speaker localization. The key idea is to leverage the power of deep neural networks (DNN) to accurately identify T-F units that are relatively clean for TDOA estimation. All of the proposed algorithms exhibit strong robustness for TDOA estimation in environments with low input SNR, high reverberation and low direction-to-reverberant energy ratio.


 DOI: 10.21437/Interspeech.2018-1652

Cite as: Wang, Z., Zhang, X., Wang, D. (2018) Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks. Proc. Interspeech 2018, 322-326, DOI: 10.21437/Interspeech.2018-1652.


@inproceedings{Wang2018,
  author={Zhong-Qiu Wang and Xueliang Zhang and DeLiang Wang},
  title={Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={322--326},
  doi={10.21437/Interspeech.2018-1652},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1652}
}