9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Multipitch Tracking Using a Factorial Hidden Markov Model

Michael Wohlmayr, Franz Pernkopf

Graz University of Technology, Austria

In this paper, we present an approach to track the pitch of two simultaneous speakers. Using a well-known feature extraction method based on the correlogram, we track the resulting data using a factorial hidden Markov model (FHMM). In contrast to the recently developed multipitch determination algorithm [1], which is based on a HMM, we can accurately associate estimated pitch points with their corresponding source speakers. We evaluate our approach on the "Mocha-TIMIT" database [2] of speech utterances mixed at 0dB, and compare the results to the multipitch determination algorithm [1] used as a baseline. Experiments show that our FHMM tracker yields good performance for both pitch estimation and correct speaker assignment.


  1. Wu M., Wang D. and Brown G.J., "A Multipitch Tracking Algorithm for Noisy Speech", IEEE Transactions On Speech and Audio Processing, 11(3):229-241, 2003.
  2. Wrench A., "A multichannel/multispeaker articulatory database for continuous speech recognition research", Phonus, 5:3-17, 2000

Full Paper

Bibliographic reference.  Wohlmayr, Michael / Pernkopf, Franz (2008): "Multipitch tracking using a factorial hidden Markov model", In INTERSPEECH-2008, 147-150.