11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Integration of Multilayer Regression Analysis with Structure-Based Pronunciation Assessment

Masayuki Suzuki (1), Yu Qiao (2), Nobuaki Minematsu (1), Keikichi Hirose (1)

(1) University of Tokyo, Japan
(2) Shenzhen Institute of Advanced Technology, China

Automatic pronunciation assessment has several difficulties. Adequacy in controlling the vocal organs is often estimated from the spectral envelopes of input utterances but the envelope patterns are also affected by other factors such as speaker identity. Recently, a new method of speech representation was proposed where these non-linguistic variations are effectively removed through modeling only the contrastive aspects of speech features. This speech representation is called speech structure. However, the often excessively high dimensionality of the speech structure can degrade the performance of structure-based pronunciation assessment. To deal with this problem, we integrate multilayer regression analysis with the structure-based assessment. The results show higher correlation between human and machine scores and also show much higher robustness to speaker differences compared to widely used GOP-based analysis.

Full Paper

Bibliographic reference.  Suzuki, Masayuki / Qiao, Yu / Minematsu, Nobuaki / Hirose, Keikichi (2010): "Integration of multilayer regression analysis with structure-based pronunciation assessment", In INTERSPEECH-2010, 586-589.