Prosody in Speech Recognition and Understanding

October 22-24, 2001
Molly Pitcher Inn, Red Bank, NJ, USA

Temporal Features for Broadcast News Segmentation

Michael T. Johnson, Leah H. Jamieson

(1) Department of Electrical and Computer Engineering, Marquette University, Milwaukee, WI, USA
(2) Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, USA

The task of automatically segmenting an acoustic signal into categories (such as speech, speech over background music, or music) is an important step in the transcription process. We are attempting to improve the accuracy of such segmentation systems by incorporating suprasegmental and other temporal information into the frame-based classifiers typically used for this purpose. Two specific approaches are introduced here, one based on using frequency contours to improve the location of segment boundaries and one based on including temporal features directly into the frame-based classifier. Results indicate that improvement in classification accuracy can be achieved through the use of temporal information, particularly for the speech plus music class where methods using traditional features often give poor results.


Full Paper (PDF)   Full Paper (Zipped Postscript)

Bibliographic reference.  Johnson, Michael T. / Jamieson, Leah H. (2001): "Temporal features for broadcast news segmentation", In Prosody-2001, paper 15.