In this paper, we present a simple and efficient feature modeling approach for tracking the pitch of two speakers speaking simultaneously. We model the spectrogram features using Gaussian Mixture Models (GMMs) in combination with the Minimum Description Length (MDL) model selection criterion. This enables to automatically determine the number of Gaussian components depending on the available data for a specific pitch pair. A factorial hidden Markov model (FHMM) is applied for tracking. We compare our approach to two methods based on correlogram features . Those methods either use a HMM  or a FHMM  for tracking. Experimental results on the Mocha-TIMIT database  show that our proposed approach significantly outperforms the correlogrambased methods for speech utterances mixed at 0dB. The superior performance even holds when adding white Gaussian noise to the mixed speech utterances during pitch tracking.
Bibliographic reference. Wohlmayr, Michael / Pernkopf, Franz (2009): "Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model", In INTERSPEECH-2009, 1079-1082.