![]() |
SAPA-SCALE Conference 2012Portland, OR, USA |
![]() |
We present a novel approach for automatic Language Identification (LID) using spectro-temporal patch features. Our approach is based on the premise that speech and spoken phenomena are characterized by typical visible patterns in time-frequency representations of the signal, and that the manner of occurrence of these patterns is language specific. To model this, we derive a randomly selected library of spectro-temporal patterns from spoken examples from a language, and derive features from the correlations of this library to spectrograms derived from the speech signal. Under our hypothesis, the relative frequency of correlation peaks must be different for different languages. We model this by learning a discriminative classifier based on these features to detect the presence of the language in a recording. The proposed approach has been tested on two different datasets: the VoxForge multilingual speech data and CallFriend corpus available from the Linguistic Data Consortium (LDC).
Index Terms: Language identification, Spectro-temporal patches, Discriminative classification
Bibliographic reference. Sahni, Kamal / Dighe, Pranay / Singh, Rita / Raj, Bhiksha (2012): "Language identification using spectro-temporal patch features", In SAPA-SCALE-2012, 110-113.