Speaker detection in broadcast speech databases

Aaron E. Rosenberg, Ivan Magrin-Chagnolleau, S. Parthasarathy, Qian Huang

Experiments have been carried out to assess the feasibility of detecting target speaker segments in multi-speaker broadcast databases, The experimental database consists of NBC Nightly News broadcasts. The target speaker is the news anchor, Tom Brokaw. Gaussian mixture models are constructed from labelled training data for the target speaker as well as background models for other speakers,, commercials, and music. Four labelled 30-min. broadcasts are used ffor testing. Mel-frequency cepstral features, augmented by delta cepstral features, are calculated over 20 msec. windows shifted every 10 msecs. through a broadcast. Likelihood ratio scores are calculated for each test frame averaged over blocks of frames with a specified duration. The block scores are input to a detection routine which returns estimates of target segment boundaries. The range of best results obtained over the test broadcasts is 82% to 100% detection of target segments with segment frame accuracy ranging from 86% to 95%. 0 to 2 false alarm segments are detected over each 30 min. broadcast.

doi: 10.21437/ICSLP.1998-241

Cite as: Rosenberg, A.E., Magrin-Chagnolleau, I., Parthasarathy, S., Huang, Q. (1998) Speaker detection in broadcast speech databases. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0202, doi: 10.21437/ICSLP.1998-241

  author={Aaron E. Rosenberg and Ivan Magrin-Chagnolleau and S. Parthasarathy and Qian Huang},
  title={{Speaker detection in broadcast speech databases}},
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0202},