Sixth European Conference on Speech Communication and Technology
Most current language identification (LID) systems make little or no use of prosodic information, despite the importance of prosody in LID by humans. The greatest obstacle has been that of finding an appropriate feature set which captures linguistically relevant prosodic information. The only system to attempt LID entirely on the basis of prosodic variables uses a set of over 200 features which are selected and combined in a task-specific manner . We apply a novel recurrent neural network model to the task of pairwise discrimination among languages. Network inputs are limited to delta-F0 and the first difference of the band limited amplitude envelope. Initial results are based on all pairwise combinations of English, German, Japanese, Mandarin and Spanish, with 90 speakers per language.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Cummins, Fred / Gers, Felix / Schmidhuber, Jürgen (1999): "Language identification from prosody without explicit features", In EUROSPEECH'99, 371-374.