This paper investigates the performance of a Factor Analysis stage in audio segmentation systems. The system described here is designed to segment and classify the audio files coming from broadcast programs into five different classes: speech, speech with noise, speech with music, music or others. This task was recently proposed as a competitive evaluation organized by the Spanish Network on Speech Technologies as part of the conference FALA 2010. The system proposed here makes use of a hierarchical structure in two steps with two different acoustic features. First, the system decides among music, speech with music or the rest of the classes by using HMM/GMM and a smoothed combination of MFCC and Chroma as feature vectors. Next, the system classifies speech and speech with noise by using FA and MFCC as acoustic features. The results shows that, with this configuration, the error rate achieved is lower than the one obtained by the best system presented in the FALA 2010 evaluation.
Bibliographic reference. Castán, Diego / Vaquero, Carlos / Ortega, Alfonso / Martínez, David / Villalba, Jesús / Lleida, Eduardo (2011): "Hierarchical audio segmentation with HMM and factor analysis in broadcast news domain", In INTERSPEECH-2011, 421-424.