Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking

Wangyou Zhang, Ying Zhou, Yanmin Qian


In the scenario with noise and reverberation, the performance of current methods for direction of arrival (DOA) estimation usually degrades significantly. Inspired by the success of time-frequency masking in speech enhancement and speech separation, this paper proposes new methods to better utilize time-frequency masking in convolution neural network to improve the robustness of localization. First a mask estimation network is developed to assist DOA estimation by either appending or multiplying the estimated masks to the original input feature. Then we further propose a multi-task learning architecture to optimize the mask and DOA estimation networks jointly, and two modes are designed and compared. Experiments show that all the proposed methods have better robustness and generalization in noisy and reverberant conditions compared to the conventional methods, and the multi-task methods have the best performance among all approaches.


 DOI: 10.21437/Interspeech.2019-3158

Cite as: Zhang, W., Zhou, Y., Qian, Y. (2019) Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking. Proc. Interspeech 2019, 2703-2707, DOI: 10.21437/Interspeech.2019-3158.


@inproceedings{Zhang2019,
  author={Wangyou Zhang and Ying Zhou and Yanmin Qian},
  title={{Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2703--2707},
  doi={10.21437/Interspeech.2019-3158},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3158}
}