11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Towards Long-Range Prosodic Attribute Modeling for Language Recognition

Raymond W. M. Ng (1), Cheung-Chi Leung (2), Ville Hautamäki (2), Tan Lee (1), Bin Ma (2), Haizhou Li (2)

(1) Chinese University of Hong Kong, China
(2) A*STAR, Singapore

As a high-level feature, prosody may be an effective feature when it is modeled over longer ranges than the typical range of a syllable. This paper is about language recognition with the high-level prosodic attributes. It studies two important issues of long-range modeling, namely the data scarcity handling method, and the model which properly describes prosodic boundary events. Illustrated by NIST language recognition evaluation (LRE) 2009, long-range modeling is shown to bring a 7.2% relative improvement to a prosodic language detector. Score fusion between the long-range prosodic system and a phonotactic system gives an EER of 3.07%. Exploiting boundary N-grams is the main contributing factor to global EER reduction, while different long-range prosodic modeling factors benefit the detection of different languages. Analysis reveals the evidence of language-specific long-range prosodic attributes, which sheds light on robust long-range modeling methods for language recognition.

