![]() |
Modeling Pronunciation Variation for Automatic Speech RecognitionRolduc, The Netherlands |
![]() ![]() |
In this paper we report on how Finite-State Transducers (FST) can be integrated into a CSR system to express the regular context-dependent relationship between a canonical phonemic language model and its possible phonetic realizations, and thus cover many of pronunciation variation and inter- and intra-word coarticulation phenomena. By having a separate intermediate model for those phenomena we keep them out of the higher lexical level and the lower level of acoustic models so each of the levels can be handled separately and can be made more accurate. We present some experimental results of the use of FSTs in our experimental CSR system ARCOS-G[1]. These FSTs were compiled from combinations of so-called two-level rules which were assembled manually according to a small set of well-known linguistic rules in German. We then address the question of automatically generating FSTs from examples. Our special focus is to keep the resulting FSTs manageably small by combining the many particular rules learned from examples with few, but more general rules manually collected. Experiments with different approaches are presented.
Bibliographic reference. Safra, Schamai / Lehtinen, Gunnar / Huber, Karl (1998): "Modeling pronunciation variations and coarticulation with finite-state transducers in CSR", In MPV-1998, 125-130.