ISCA Archive PMLA 2002
ISCA Archive PMLA 2002

Pronunciation modeling using a finite-state transducer representation

Timothy J. Hazen, I. Lee Hetherington, Han Shu, Karen Livescu

The MIT SUMMIT speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finitestate transducer (FST) representation whose transition weights can be probabilistically trained using a modified EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the bene- fits and weaknesses of the approach both conceptually and empirically using the recognizer for our JUPITER weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system reduces word error rates by between 4% and 8% over different test sets when compared against a system using no phonological rewrite rules.


Cite as: Hazen, T.J., Hetherington, I.L., Shu, H., Livescu, K. (2002) Pronunciation modeling using a finite-state transducer representation. Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002), 99-104

@inproceedings{hazen02_pmla,
  author={Timothy J. Hazen and I. Lee Hetherington and Han Shu and Karen Livescu},
  title={{Pronunciation modeling using a finite-state transducer representation}},
  year=2002,
  booktitle={Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002)},
  pages={99--104}
}