Sixth International Conference on Spoken Language Processing (ICSLP 2000)
October 16-20, 2000
A Comparison of Data-Derived and Knowledge-Based Modeling of Pronunciation Variation
Mirjam Wester (1,2), Eric Fosler-Lussier (1)
(1) International Computer Science Institute, Berkeley, CA, USA
This paper focuses on modeling pronunciation variation
in two different ways: data-derived and knowledge-based.
The knowledge-based approach consists of using
phonological rules to generate variants. The data-derived
approach consists of performing phone recognition,
followed by various pruning and smoothing methods to
alleviate some of the errors in the phone recognition.
Using phonological rules led to a small improvement in
WER; whereas, using a data-derived approach in which
the phone recognition was smoothed using simple
decision trees (d-trees) prior to lexicon generation led to a
significant improvement compared to the baseline.
Furthermore, we found that 10% of variants generated by
the phonological rules were also found using phone
recognition, and this increased to 23% when the phone
recognition output was smoothed by using d-trees. In
addition, we propose a metric to measure confusability in
the lexicon and we found that employing this confusion
metric to prune variants results in roughly the same
improvement as using the d-tree method.
(2) A2RT, Dept. of Language and Speech, University of Nijmegen, The Netherlands
Wester, Mirjam / Fosler-Lussier, Eric (2000):
"A comparison of data-derived and knowledge-based modeling of pronunciation variation",
In ICSLP-2000, vol.1, 270-273.