Auditory-Visual Speech Processing (AVSP) 2010

Hakone, Kanagawa, Japan
September 30-October 3, 2010

Audio-Visual Television Broadcast Programs Processing, Transcription, Indexing and Searching

Josef Chaloupka, Jan Nouza

SpeechLab, Institute of Information Technology and Electronics, Technical University of Liberec, Czech Republic

This paper describes the development of a system for automatic television broadcast news processing, transcription and indexing. The main task of our system is automatic transcription of television broadcast programs from audio signal. The transcribed recordings are indexed and saved to the database, therefore as the second task we have created the web-system for searching in the database. It is possible to search information in the database according to key words (sentences) or according to who was a speaker. Time boundaries of single words or audio segments are saved to the database too during the indexing phase of the processing, therefore we can compare found information from the database with the original recording very easily. The visual information from television recordings is processed in our system too. The modules for visual signal segmentation, for face detection and identification, and for visual speech detection have been added to the transcription system. Indexed recognized visual information is saved to the database together with the information from acoustic signal and it is included in the searching web-system.

Index Terms: audio-visual television broadcast transcription, visual signal segmentation, visual speaker identification

