Multipitch tracking is a challenging problem for speech and signal processing. In this paper, we use deep neural networks (DNNs) to model the probabilistic pitch states of two simultaneous speakers. To closely capture speaker-dependent information and improve the accuracy of speaker assignment, we train a DNN for each enrolled speaker (speaker-dependent DNN). We also explore the feasibility of training a DNN for each speaker pair in the system (speaker-pair-dependent DNN). A factorial hidden Markov model (FHMM) then integrates the pitch probabilities and generates most likely pitch contours with a junction tree algorithm. We evaluate our system on the GRID corpus. Experiments show that our approach substantially outperforms state-of-the-art multipitch trackers on both same-gender and different-gender two-talker mixtures.
Cite as: Liu, Y., Wang, D. (2015) Speaker-dependent multipitch tracking using deep neural networks. Proc. Interspeech 2015, 3279-3283, doi: 10.21437/Interspeech.2015-660
@inproceedings{liu15k_interspeech, author={Yuzhou Liu and DeLiang Wang}, title={{Speaker-dependent multipitch tracking using deep neural networks}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3279--3283}, doi={10.21437/Interspeech.2015-660} }