ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Speaker-dependent multipitch tracking using deep neural networks

Yuzhou Liu, DeLiang Wang

Multipitch tracking is a challenging problem for speech and signal processing. In this paper, we use deep neural networks (DNNs) to model the probabilistic pitch states of two simultaneous speakers. To closely capture speaker-dependent information and improve the accuracy of speaker assignment, we train a DNN for each enrolled speaker (speaker-dependent DNN). We also explore the feasibility of training a DNN for each speaker pair in the system (speaker-pair-dependent DNN). A factorial hidden Markov model (FHMM) then integrates the pitch probabilities and generates most likely pitch contours with a junction tree algorithm. We evaluate our system on the GRID corpus. Experiments show that our approach substantially outperforms state-of-the-art multipitch trackers on both same-gender and different-gender two-talker mixtures.


doi: 10.21437/Interspeech.2015-660

Cite as: Liu, Y., Wang, D. (2015) Speaker-dependent multipitch tracking using deep neural networks. Proc. Interspeech 2015, 3279-3283, doi: 10.21437/Interspeech.2015-660

@inproceedings{liu15k_interspeech,
  author={Yuzhou Liu and DeLiang Wang},
  title={{Speaker-dependent multipitch tracking using deep neural networks}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={3279--3283},
  doi={10.21437/Interspeech.2015-660}
}