Sixth International Conference on Spoken Language Processing
Processing time is an important factor in making a speech transcription system viable for automatic indexation of radio and television broadcasts. When only concerned by the word error rate, it is common to design systems that run in 100 times real-time or more. This paper addresses issues in reducing the speech recognition time for automatic indexation of radio and TV broadcasts with the aim of obtaining reasonable performance for close to real-time operation. We investigated computational resources in the range 1 to 10xRT on commonly available platforms. Constraints on the computational resources led us to reconsider design issues, particularly those concerning the acoustic models and the decoding strategy. A new decoder was implemented which transcribes broadcast data in few times real-time with only a slight increase in word error rate when compared to our best system. Experiments with spoken document retrieval show that comparable IR results are obtained with a 10xRT automatic transcription or with manual transcription, and that reasonable performamce is still obtained with a 1.4xRT transcription system.
Bibliographic reference. Gauvain, Jean-Luc / Lamel, Lori (2000): "Fast decoding for indexation of broadcast data", In ICSLP-2000, vol.3, 794-797.