10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Finite Mixture Spectrogram Modeling for Multipitch Tracking Using A Factorial Hidden Markov Model

Michael Wohlmayr, Franz Pernkopf

Graz University of Technology, Austria

In this paper, we present a simple and efficient feature modeling approach for tracking the pitch of two speakers speaking simultaneously. We model the spectrogram features using Gaussian Mixture Models (GMMs) in combination with the Minimum Description Length (MDL) model selection criterion. This enables to automatically determine the number of Gaussian components depending on the available data for a specific pitch pair. A factorial hidden Markov model (FHMM) is applied for tracking. We compare our approach to two methods based on correlogram features [1]. Those methods either use a HMM [1] or a FHMM [7] for tracking. Experimental results on the Mocha-TIMIT database [2] show that our proposed approach significantly outperforms the correlogrambased methods for speech utterances mixed at 0dB. The superior performance even holds when adding white Gaussian noise to the mixed speech utterances during pitch tracking.

Full Paper

Bibliographic reference.  Wohlmayr, Michael / Pernkopf, Franz (2009): "Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model", In INTERSPEECH-2009, 1079-1082.