Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Speaker Tracking and Detection with Multiple Speakers

Kemal Sönmez (1), Larry Heck (2), Mitchel Weintraub (2)

(1) SRI International, Menlo Park, CA, USA
(2) Nuance Communications, Menlo Park, CA, USA

We describe a speaker tracking and detection system, for Switchboard conversations, that uses a two­speaker and silence hidden Markov model (HMM)with a minimumstate duration constraint and Gaussian mixture model (GMM) state distributionsadapted from a single gender- and hand­set­independent imposter model distribution. Speaker tracking is used to segment speakers for detection, which is carried out by averaging frame scores of the Viterbi path and HNORM’ing via a novel parameter interpolation extension of HNORM for use with files of arbitrary lengths. Use of duration statistics augmenting the acoustic scores is also introduced via a nonlinear combination function. Results are reported on the NIST 1998 Multispeaker development evaluation dataset.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Sönmez, Kemal / Heck, Larry / Weintraub, Mitchel (1999): "Speaker tracking and detection with multiple speakers", In EUROSPEECH'99, 2219-2222.