This paper makes an intensive study on the contextual modeling methods of duration information, for the purpose of improving the performance of pronunciation assessment and mispronunciation detection. The main ideas include: 1) we extend the relations among duration sequence with different level of contextual constraints, and bring them into a unified framework. 2) A backoff mechanism is introduced to resolve the problem of data sparseness and unbalanced distribution. 3) Rather than the traditional parametric functions, we use the discrete modeling for empirical duration distributions based on look-up tables, which can improve the model precision and accelerate the computation speed. The experimental results show the effectiveness of the above methods. The proposed word-dependent duration models can yield 0.0782 in absolute CC (correlation coefficient) improvement and 4.58% in absolute EER (equal error rate) reduction for the tasks of pronunciation assessment and mispronunciation detection respectively, both compared with the baseline method with conventional context-independent case.
Bibliographic reference. Li, Hongyan / Huang, Shen / Wang, Shijin / Xu, Bo (2011): "Context-dependent duration modeling with backoff strategy and look-up tables for pronunciation assessment and mispronunciation detection", In INTERSPEECH-2011, 1133-1136.