This paper describes a topic segmentation and indexation system for broadcast news that is integrated in an alert system for selective dissemination of multimedia information. The goal of this work is to enhance the retrieval and navigation through specific spoken audio segments that have been automatically transcribed, using speech recognition. Our segmentation algorithm is based on simple heuristics related with anchor detection. The indexation is based on hierarchical concept trees, containing 22 main thematic domains, for which Hidden Markov models were created. Only the three top levels in this thesaurus are currently used for indexation. On-going work on the identification of some cues related to the structure of TV broadcast news programs is also described.
Cite as: Amaral, R., Trancoso, I. (2003) Segmentation and indexation of broadcast news. Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003), 31-36
@inproceedings{amaral03_msdr, author={Rui Amaral and Isabel Trancoso}, title={{Segmentation and indexation of broadcast news}}, year=2003, booktitle={Proc. ISCA Workshop on Multilingual Spoken Document Retrieval (MSDR 2003)}, pages={31--36} }