As a high-level feature, prosody may be an effective feature when it is modeled over longer ranges than the typical range of a syllable. This paper is about language recognition with the high-level prosodic attributes. It studies two important issues of long-range modeling, namely the data scarcity handling method, and the model which properly describes prosodic boundary events. Illustrated by NIST language recognition evaluation (LRE) 2009, long-range modeling is shown to bring a 7.2% relative improvement to a prosodic language detector. Score fusion between the long-range prosodic system and a phonotactic system gives an EER of 3.07%. Exploiting boundary N-grams is the main contributing factor to global EER reduction, while different long-range prosodic modeling factors benefit the detection of different languages. Analysis reveals the evidence of language-specific long-range prosodic attributes, which sheds light on robust long-range modeling methods for language recognition.
Bibliographic reference. Ng, Raymond W. M. / Leung, Cheung-Chi / Hautamäki, Ville / Lee, Tan / Ma, Bin / Li, Haizhou (2010): "Towards long-range prosodic attribute modeling for language recognition", In INTERSPEECH-2010, 1792-1795.