ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model

Michael Wohlmayr, Franz Pernkopf

In this paper, we present a simple and efficient feature modeling approach for tracking the pitch of two speakers speaking simultaneously. We model the spectrogram features using Gaussian Mixture Models (GMMs) in combination with the Minimum Description Length (MDL) model selection criterion. This enables to automatically determine the number of Gaussian components depending on the available data for a specific pitch pair. A factorial hidden Markov model (FHMM) is applied for tracking. We compare our approach to two methods based on correlogram features [1]. Those methods either use a HMM [1] or a FHMM [7] for tracking. Experimental results on the Mocha-TIMIT database [2] show that our proposed approach significantly outperforms the correlogrambased methods for speech utterances mixed at 0dB. The superior performance even holds when adding white Gaussian noise to the mixed speech utterances during pitch tracking.


doi: 10.21437/Interspeech.2009-31

Cite as: Wohlmayr, M., Pernkopf, F. (2009) Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model. Proc. Interspeech 2009, 1079-1082, doi: 10.21437/Interspeech.2009-31

@inproceedings{wohlmayr09_interspeech,
  author={Michael Wohlmayr and Franz Pernkopf},
  title={{Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={1079--1082},
  doi={10.21437/Interspeech.2009-31}
}