Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

A Stream-Based Audio Segmentation, Classification and Clustering Pre-Processing System for Broadcast News Using ANN Models

Hugo Meinedo, Joao Neto

INESC-ID/IST, Lisbon, Portugal

This paper describes our work on the development of a low latency stream-based audio pre-processing system for broadcast news using model-based techniques. It performs speech/non-speech classification, speaker segmentation, speaker clustering, gender and background conditions classification. As a way to increase the modelling accuracy our algorithms make extensive use of Artificial Neural Networks (ANN) thus avoiding the rough assumptions normally made about the audio signal distribution. Experiments were conducted on the COST278 multilingual TV broadcast news database and compared with current state of the art algorithms using standard evaluation tools. Additionally we investigated the impact of automatic audio pre-processing system within the recognition using a large broadcast news test database for the European Portuguese. These tests show a small degradation in recognition performance when compared with hand labelled audio segmentation. Our system is part of a prototype close-captioning system that is daily processing the main news show of two Portuguese Broadcasters.

Full Paper

Bibliographic reference.  Meinedo, Hugo / Neto, Joao (2005): "A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models", In INTERSPEECH-2005, 237-240.