ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Language and genre detection in audio content analysis

Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson

This paper presents an audio genre detection framework that can be used for a multi-language audio corpus. Cepstral coefficients are considered and analyzed as the feature set for both a language dependent and language independent genre identification (GID) task. Language information is found to increase the overall detection accuracy on an average by at least 2.6% from its language independent counterpart. Mel-frequency cepstral coefficients have been widely used for Music Information Retrieval (MIR), however, the present study shows that Linear-frequency cepstral coefficients (LFCC) with a higher number of frequency bands can improve the detection accuracy. Two other GID architectures have also been considered, but the results show that the log-energy amplitudes from triangular linearly spaced filter banks and their deltas can offer average detection accuracy as high as 98.2%, when language information is taken into account.

doi: 10.21437/Interspeech.2008-621

Cite as: Mitra, V., Garcia-Romero, D., Espy-Wilson, C.Y. (2008) Language and genre detection in audio content analysis. Proc. Interspeech 2008, 2506-2509, doi: 10.21437/Interspeech.2008-621

  author={Vikramjit Mitra and Daniel Garcia-Romero and Carol Y. Espy-Wilson},
  title={{Language and genre detection in audio content analysis}},
  booktitle={Proc. Interspeech 2008},