The task of automatically segmenting an acoustic signal into categories (such as speech, speech over background music, or music) is an important step in the transcription process. We are attempting to improve the accuracy of such segmentation systems by incorporating suprasegmental and other temporal information into the frame-based classifiers typically used for this purpose. Two specific approaches are introduced here, one based on using frequency contours to improve the location of segment boundaries and one based on including temporal features directly into the frame-based classifier. Results indicate that improvement in classification accuracy can be achieved through the use of temporal information, particularly for the speech plus music class where methods using traditional features often give poor results.
Cite as: Johnson, M.T., Jamieson, L.H. (2001) Temporal features for broadcast news segmentation. Proc. ITRW on Prosody in Speech Recognition and Understanding, paper 15
@inproceedings{johnson01_prosody, author={Michael T. Johnson and Leah H. Jamieson}, title={{Temporal features for broadcast news segmentation}}, year=2001, booktitle={Proc. ITRW on Prosody in Speech Recognition and Understanding}, pages={paper 15} }