8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

An Improved Preprocessor for the Automatic Transcription of Broadcast News Audio Stream

Jindrich Zdansky, Petr David, Jan Nouza

Technical University of Liberec, Czech Republic

This paper deals with the preprocessing of the broadcast news (BN) audio stream for the automatic transcription purposes. The preprocessing consists of the automatic segmentation followed by the broad-class segment identification. The former is capable of detecting speaker and/or acoustic changes in the BN audio stream with the precision being 82.75%. The latter acts as a filter that removes non-speech parts. The performance of the proposed system was evaluated on the multi-lingual pan-European COST278 BN database containing data in 6 languages. The preprocessing and segmentation module operates in a near-real-time way, with the total delay of 12 seconds. Its practical functionality was evaluated on the Czech part of the BN database. The automatically segmented signal was directly sent to the large vocabulary continuous speech recognition system operating with a 200K-word Czech lexicon. The difference in performance between automatically and manually segmented BN streams was only minimal - 1.12%.

Full Paper

Bibliographic reference.  Zdansky, Jindrich / David, Petr / Nouza, Jan (2004): "An improved preprocessor for the automatic transcription of broadcast news audio stream", In INTERSPEECH-2004, 1065-1068.