Multipitch tracking is a challenging problem for speech and signal processing. In this paper, we use deep neural networks (DNNs) to model the probabilistic pitch states of two simultaneous speakers. To closely capture speaker-dependent information and improve the accuracy of speaker assignment, we train a DNN for each enrolled speaker (speaker-dependent DNN). We also explore the feasibility of training a DNN for each speaker pair in the system (speaker-pair-dependent DNN). A factorial hidden Markov model (FHMM) then integrates the pitch probabilities and generates most likely pitch contours with a junction tree algorithm. We evaluate our system on the GRID corpus. Experiments show that our approach substantially outperforms state-of-the-art multipitch trackers on both same-gender and different-gender two-talker mixtures.
Bibliographic reference. Liu, Yuzhou / Wang, DeLiang (2015): "Speaker-dependent multipitch tracking using deep neural networks", In INTERSPEECH-2015, 3279-3283.