9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Language and Genre Detection in Audio Content Analysis

Vikramjit Mitra, Daniel Garcia-Romero, Carol Y. Espy-Wilson

University of Maryland, USA

This paper presents an audio genre detection framework that can be used for a multi-language audio corpus. Cepstral coefficients are considered and analyzed as the feature set for both a language dependent and language independent genre identification (GID) task. Language information is found to increase the overall detection accuracy on an average by at least 2.6% from its language independent counterpart. Mel-frequency cepstral coefficients have been widely used for Music Information Retrieval (MIR), however, the present study shows that Linear-frequency cepstral coefficients (LFCC) with a higher number of frequency bands can improve the detection accuracy. Two other GID architectures have also been considered, but the results show that the log-energy amplitudes from triangular linearly spaced filter banks and their deltas can offer average detection accuracy as high as 98.2%, when language information is taken into account.

Full Paper

Bibliographic reference.  Mitra, Vikramjit / Garcia-Romero, Daniel / Espy-Wilson, Carol Y. (2008): "Language and genre detection in audio content analysis", In INTERSPEECH-2008, 2506-2509.