Interspeech'2005 - Eurospeech
In this work, we present an innovative approach for grapheme to phoneme conversion, which achieves very low error rates for languages like British English, American English and Dutch, and gives good generalization performances. One of the basic steps in the text-to-speech conversion performed by the speech synthesis systems is the phonetic transcription of the input text that can be considered as an intermediate symbolic representation between the graphemic text and the phones sequence that must be generated. Nevertheless, the definition of explicit rules can be very difficult for some languages. For this reason using a tool able to automatically compress and generalize the lexical knowledge into rules is very useful. In the multilanguage development of the Loquendo speech synthesis system, a machine-learning algorithm applied to the problem of phonetic transcription and extraction of grapheme-phoneme association rules has been developed. The algorithm runs on a training set built up by a lexicon made of words stored in two forms, orthographic and phonetic, and is able to learn and/or predict the phonetic form starting from the previous information: the prediction error on the training set proves to be very low (restricted to some words managed as exceptions), assuring the absolute reliability of the result on the lexicon words; with words that do not occur in the lexicon, the algorithm predicts correct or acceptable transcriptions.
Bibliographic reference. Massimino, Paolo / Pacchiotti, Alberto (2005): "An automaton-based machine learning technique for automatic phonetic transcription", In INTERSPEECH-2005, 1901-1904.