This paper investigates the issue of automatic segmentation of speech recordings for broadcast news (BN) and broadcast conversation (BC) speech recognition. Our previous segmentation algorithm often exhibited high deletion errors, where some speech segments were misclassified as non-speech and thus were never passed on to the recognizer. In contrast with our previous segmentation models, which only differentiated between speech and non-speech segments, phonetic knowledge is applied to represent speech by using multiple models for different types of speech segments. Moreover, the "pronunciation" of the speech segment has been modified to loosen the minimum duration constraint. This method makes use of language specific knowledge, while keeping the number of models low to achieve fast segmentation. Experimental results show that the new segmenter outperforms our previous segmenter significantly, particularly in reducing deletion errors.
Bibliographic reference. Peng, Gang / Hwang, Mei-Yuh / Ostendorf, Mari (2007): "Automatic acoustic segmentation for speech recognition on broadcast recordings", In INTERSPEECH-2007, 2977-2980.