INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Factor Analysis for Audio-Based Video Genre Classification

Mickael Rouvier, Driss Matrouf, Georges Linarès

LIA, France

Statistical classifiers operate on features that generally include both useful and useless information. These two types of information are difficult to separate in the feature domain. Recently, a new paradigm based on a Latent Factor Analysis (LFA) proposed a model decomposition into useful and useless components. This method was successfully applied to speaker and language recognition tasks. In this paper, we study the use of LFA for video genre classification by using only the audio channel. We propose a classification method based on short-term cepstral features and Gaussian Mixture Models (GMM) or Support Vector Machine (SVM) classifiers, that are combined with Factor Analysis (FA). Experiments are conducted on a corpus composed of 5 types of video (musics, commercials, cartoons, movies and news). The relative classification error reduction obtained by using the best factor analysis configuration with respect to the baseline system, Gaussian Mixture Model Universal Background Model (GMM-UBM), is about 56%, corresponding to a correct identification rate of about 90%.

Full Paper

Bibliographic reference.  Rouvier, Mickael / Matrouf, Driss / Linarès, Georges (2009): "Factor analysis for audio-based video genre classification", In INTERSPEECH-2009, 1155-1158.