In this paper, we describe a method for automatically generating a domain-dependent pronunciation lexicon using a data-driven approach. We also introduce an adaptation method to alleviate some of the errors caused by the data-driven rules which are derived from a relatively small volume of speech corpus. At first, pronunciation variation rules are extracted from a large volume of speech corpus and then are adapted using the rules derived from the target corpus. The context dependent pronunciation variants of the target lexicon are automatically generated by applying these rules to the training and language model adaptation text corpus. Then the pronunciation variants are pruned based on the likelihood of applied rules. Compared to the lexicon created by knowledge-based rules, on the Korean spontaneous speech corpus, our approach produces an absolute reduction of 0.8% of the WER. Furthermore, the size of pronunciation variants is reduced by almost 5.6% on the peak performance.
Cite as: Jeon, J.H., Chung, M. (2005) Automatic generation of domain-dependent pronunciation lexicon with data-driven rules and rule adaptation. Proc. Interspeech 2005, 1337-1340, doi: 10.21437/Interspeech.2005-486
@inproceedings{jeon05b_interspeech, author={Je Hun Jeon and Minhwa Chung}, title={{Automatic generation of domain-dependent pronunciation lexicon with data-driven rules and rule adaptation}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1337--1340}, doi={10.21437/Interspeech.2005-486} }