![]() |
ISCA Workshop on Multilingual Speech and Language Processing (MULTILING 2006)Center for Language and Speech Technology, Stellenbosch University, Stellenbosch, South Africa |
![]() |
Various automated techniques can be used to generalise from phonemic lexicons through the extraction of grapheme-to-phoneme rule sets. These techniques are particularly useful when developing pronunciation models for previously unmodelled languages: a frequent requirement when developing multilingual speech processing systems. However, many of the learning algorithms (such as Dynamically Expanding Context or Default& Refine) experience difficulty in accommodating alternate pronunciations that occur in the training lexicon.
In this paper we propose an approach for the incorporation of phonemic variants in a typical instancebased learning algorithm, Default&Refine. We investigate the use of a combined ‘pseudo-phoneme’ associated with a set of ‘generation restriction rules’ to model those phonemes that are consistently realised as two or more variants in the training lexicon.
We evaluate the effectiveness of this approach using the Oxford Advanced Learners Dictionary, a publicly available English pronunciation lexicon. We find that phonemic variation exhibits sufficient regularity to be modelled through extracted rules, and that acceptable variants may be underrepresented in the studied lexicon. The proposed method is applicable to many approaches besides the Default&Refine algorithm, and provides a simple but effective technique for including phonemic variants in grapheme-to-phoneme rule extraction frameworks.
Bibliographic reference. Davel, Marelie / Barnard, Etienne (2006): "Extracting pronunciation rules for phonemic variants", In MULTILING-2006, paper 006.