This paper investigates lexical stress detection for L2 English speech using Deep Belief Networks (DBNs). The features of the DBN used in this work include the syllable-based prosodic features (assumed to have Gaussian distribution) and their expected lexical stress (assumed to have Bernoulli distribution). As stressed syllables are more prominent than their neighbors, the two preceding and two following syllables are taken into consideration. Experimental results show that the DBN achieves an accuracy of about 80% in syllable stress classification (primary/secondary/no stress) for words with three or more syllables. It outperforms the conventional Gaussian Mixture Model and our previous Prominence Model by an absolute accuracy of about 8% and 4%, respectively.
Bibliographic reference. Li, Kun / Qian, Xiaojun / Kang, Shiyin / Meng, Helen (2013): "Lexical stress detection for L2 English speech using deep belief networks", In INTERSPEECH-2013, 1811-1815.