In the paper a new method is suggested to effectively capture pronunciation variability in ASR tasks. Its basic principle consists in structuring word-related phonetic feature space. For each word in the system dictionary, given its "ideal" transcription, as a result of application of specially designed modification rules and constraints, a network of its phonetic realisations is generated (the so-called hierarchical word network - HWN). In contrast to "allophone networks" HWNs provide adequate covering of all phone modifications (including articulatory laxing, contextual accommodation, accidental substitutions etc.) and allow for various levels of precision in the phonetic representation of word pronunciation variability.
In view of using this representation for ASR tasks adequate sophistication of the word model is proposed with the introduction of the so-called hierarchical matching functions (HMF).
Cite as: Koval, S., Smirnova, N., Khitrov, M. (2002) Modelling pronunciation variability for ASR tasks. Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002), 59-64
@inproceedings{koval02_pmla, author={Serguei Koval and Natalia Smirnova and Mikhail Khitrov}, title={{Modelling pronunciation variability for ASR tasks}}, year=2002, booktitle={Proc. ITRW on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (PMLA 2002)}, pages={59--64} }