EUROSPEECH 2003 - INTERSPEECH 2003
Data-driven grapheme-to-phoneme conversion involves either (top-down) inductive learning or (bottom-up) pronunciation by analogy. As both approaches rely on local context information, they typically require some external linguistic knowledge, e.g., individual grapheme/phoneme correspondences. To avoid such supervision, this paper proposes an alternative solution, dubbed pronunciation by latent analogy, which adopts a more global definition of analogous events. For each out-of-vocabulary word, a neighborhood of globally relevant pronunciations is constructed through an appropriate data-driven mapping of its graphemic form. Phoneme transcription then proceeds via locally optimal sequence alignment and maximum likelihood position scoring. This method was successfully applied to the synthesis of proper names with a large diversity of origin.
Bibliographic reference. Bellegarda, Jerome R. (2003): "A latent analogy framework for grapheme-to-phoneme conversion", In EUROSPEECH-2003, 2029-2032.