We present methods for finding same or almost same news stories in the hourly radio news broadcasts spoken by the same or different announcers. They allow to establish a large database of repeated and professionally read speech at low costs that is especially interesting for prosody research, but also, e.g., for concept-to-speech and socio-linguistic studies. An automatically recorded complete radio news broadcast is first segmented into individual news stories using HMM recognition. Then, the word sequence estimates of the stories are either compared directly (naive method) or realigned with the signal of other stories (realignment method) to find out which stories were read before and which not. Both methods can be further improved by computing ``meta distances'' that also take into account distances to other stories. We find that the realignment method combined with meta distances is the most reliable of the methods on real life data.
Cite as: Rapp, S., Dogil, G. (1998) Same news is good news: automatically collecting reoccurring radio news stories. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0906, doi: 10.21437/ICSLP.1998-599
@inproceedings{rapp98b_icslp, author={Stefan Rapp and Grzegorz Dogil}, title={{Same news is good news: automatically collecting reoccurring radio news stories}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0906}, doi={10.21437/ICSLP.1998-599} }