11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Exploring Goodness of Prosody by Diverse Matching Templates

Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu

Digital Content Technology Research Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China

In automatic speech grading systems, rare research is followed through addressing the issue of GOR (Goodness Of pRosody). In this paper we ropose a novel method by taking the advantage of our QBH (Query By Humming) techniques in 2008 MIREX evaluation task. A set of standard samples related to the top-cream students are initially picked up as templates, a cascade QBH structure is then taken from two metrics: the MOMEL stylization followed by DTW distance; the Fujisaki model followed by EMD distance. Sentence GOR is obtained by the fused confidence between target and each template, which is then weighted by a PRI factor in the passage level. Experiment results indicate that performance increases with the count of template, and Fujisaki-EMD metric outperforms MOMEL-DTW one in terms of correlation. Their combination can be treated as template based GOR score, compensated with our previous feature based GOR score, the approach can achieve 0.432 in correlation and 17.90% in EER in our corpus.

Full Paper

Bibliographic reference.  Huang, Shen / Li, Hongyan / Wang, Shijin / Liang, Jiaen / Xu, Bo (2010): "Exploring goodness of prosody by diverse matching templates", In INTERSPEECH-2010, 1145-1148.