Content indexing has become necessary, not just optional, in the era where broadcast, cable and Internet produce huge amounts of media daily. Text information from spoken audio is still a key feature to understand content along with other meta-data and video features. In this paper, a new method is introduced to improve transcription quality, which allows more accurate content indexing. Our method finds phonetic similarities between two imperfect sources, closed captions and ASR outputs, and aligns them together to make quality transcriptions. In the process, even out-of-vocabulary words could be learned automatically. Given broadcast news audio and closed captions, our experimental results show that the proposed method, on average, improves word correct rates 11% from the ASR output using the baseline language model and 6% from the one using the adapted language model.
Bibliographic reference. Kim, Yeon-Jun / Gibbon, David C. (2011): "Automatic learning in content indexing service using phonetic alignment", In INTERSPEECH-2011, 925-928.