Expertise in the automatic transcription of broadcast speech has progressed to the point of being able to use the resulting transcripts for information retrieval purposes. In this paper, we first describe a corpus of automatically recognized broadcast news, a method for segmenting the broadcast into stories, and finally apply this method to retrieve stories relating to a specific topic. The method is based on Hidden Markov Models and is in analogy with the usual implementation of HMMs in speech recognition.
Cite as: Mulbregt, P.v., Carp, I., Gillick, L., Lowe, S., Yamron, J. (1998) Text segmentation and topic tracking on broadcast news via a hidden Markov model approach. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0116, doi: 10.21437/ICSLP.1998-671
@inproceedings{mulbregt98_icslp, author={Paul van Mulbregt and Ira Carp and Lawrence Gillick and Steve Lowe and Jon Yamron}, title={{Text segmentation and topic tracking on broadcast news via a hidden Markov model approach}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0116}, doi={10.21437/ICSLP.1998-671} }