INTERSPEECH 2004 - ICSLP
This paper deals with the preprocessing of the broadcast news (BN) audio stream for the automatic transcription purposes. The preprocessing consists of the automatic segmentation followed by the broad-class segment identification. The former is capable of detecting speaker and/or acoustic changes in the BN audio stream with the precision being 82.75%. The latter acts as a filter that removes non-speech parts. The performance of the proposed system was evaluated on the multi-lingual pan-European COST278 BN database containing data in 6 languages. The preprocessing and segmentation module operates in a near-real-time way, with the total delay of 12 seconds. Its practical functionality was evaluated on the Czech part of the BN database. The automatically segmented signal was directly sent to the large vocabulary continuous speech recognition system operating with a 200K-word Czech lexicon. The difference in performance between automatically and manually segmented BN streams was only minimal - 1.12%.
Bibliographic reference. Zdansky, Jindrich / David, Petr / Nouza, Jan (2004): "An improved preprocessor for the automatic transcription of broadcast news audio stream", In INTERSPEECH-2004, 1065-1068.