Sixth International Conference on Spoken Language Processing (ICSLP 2000)

Beijing, China
October 16-20, 2000

A Comparison of Data-Derived and Knowledge-Based Modeling of Pronunciation Variation

Mirjam Wester (1,2), Eric Fosler-Lussier (1)

(1) International Computer Science Institute, Berkeley, CA, USA
(2) A2RT, Dept. of Language and Speech, University of Nijmegen, The Netherlands

This paper focuses on modeling pronunciation variation in two different ways: data-derived and knowledge-based. The knowledge-based approach consists of using phonological rules to generate variants. The data-derived approach consists of performing phone recognition, followed by various pruning and smoothing methods to alleviate some of the errors in the phone recognition. Using phonological rules led to a small improvement in WER; whereas, using a data-derived approach in which the phone recognition was smoothed using simple decision trees (d-trees) prior to lexicon generation led to a significant improvement compared to the baseline. Furthermore, we found that 10% of variants generated by the phonological rules were also found using phone recognition, and this increased to 23% when the phone recognition output was smoothed by using d-trees. In addition, we propose a metric to measure confusability in the lexicon and we found that employing this confusion metric to prune variants results in roughly the same improvement as using the d-tree method.

Full Paper

Bibliographic reference.  Wester, Mirjam / Fosler-Lussier, Eric (2000): "A comparison of data-derived and knowledge-based modeling of pronunciation variation", In ICSLP-2000, vol.1, 270-273.