This work presents a general development framework for automatic pronunciation assessment within computer-assisted language learning (CALL) together with several refinements of a previously described pronunciation scoring method. This method utilises a likelihood-based 'Goodness of Pronunciation' (GOP) measure which in this work has been extended to include individual thresholds for each phone based on both averaged native con- fidence scores and on rejection statistics provided by human judges. These statistics where provided through a specifically recorded and annotated database of non-native speech. Since pronunciation assessment is highly subjective, a set of four performance measures has been designed, each of them measuring different aspects of how well computer-derived phone-level scores agree with human scores. These performance measures are used to cross-validate the reference annotations and to assess the basic GOP algorithm and its refinements. The experimental results suggest that a likelihood-based pronunciation scoring metric can achieve usable performance, especially after applying the various enhancements.
Cite as: Witt, S.M., Young, S.J. (1998) Performance measures for phone-level pronunciation teaching in call. Proc. ETRW on Speech Technology in Language Learning (STiLL), 99-102
@inproceedings{witt98_still, author={S. M. Witt and Steve J. Young}, title={{Performance measures for phone-level pronunciation teaching in call}}, year=1998, booktitle={Proc. ETRW on Speech Technology in Language Learning (STiLL)}, pages={99--102} }