ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Factor analysis for audio-based video genre classification

Mickael Rouvier, Driss Matrouf, Georges Linarès

Statistical classifiers operate on features that generally include both useful and useless information. These two types of information are difficult to separate in the feature domain. Recently, a new paradigm based on a Latent Factor Analysis (LFA) proposed a model decomposition into useful and useless components. This method was successfully applied to speaker and language recognition tasks. In this paper, we study the use of LFA for video genre classification by using only the audio channel. We propose a classification method based on short-term cepstral features and Gaussian Mixture Models (GMM) or Support Vector Machine (SVM) classifiers, that are combined with Factor Analysis (FA). Experiments are conducted on a corpus composed of 5 types of video (musics, commercials, cartoons, movies and news). The relative classification error reduction obtained by using the best factor analysis configuration with respect to the baseline system, Gaussian Mixture Model Universal Background Model (GMM-UBM), is about 56%, corresponding to a correct identification rate of about 90%.

doi: 10.21437/Interspeech.2009-336

Cite as: Rouvier, M., Matrouf, D., Linarès, G. (2009) Factor analysis for audio-based video genre classification. Proc. Interspeech 2009, 1155-1158, doi: 10.21437/Interspeech.2009-336

  author={Mickael Rouvier and Driss Matrouf and Georges Linarès},
  title={{Factor analysis for audio-based video genre classification}},
  booktitle={Proc. Interspeech 2009},