INTERSPEECH 2004 - ICSLP
Robust text-to-speech (TTS) systems require a letter-to-pronunciation module for generating the pronunciations of words missing from the system lexicon. For an input orthography, both a phone sequence and the location of lexical stress must be predicted. However, letter-to-pronunciation modules that make use of a window of context letters around a target letter normally cannot "see" larger-context morphological information that is highly correlated with stress location. By adding a new component that uses morphological information to predict which letter of a word might receive primary stress, and then using the resulting "stressed letters" as input to a decision tree stressed-letter-to-pronunciation component, improvements to both stress accuracy and phone accuracy are obtained in American English, British English, and German. Using stressed letters as the input to the decision tree also improves phone accuracy when stress is not required in the output pronunciation, as is conventionally the case for automatic speech recognition (ASR).
Bibliographic reference. Webster, Gabriel (2004): "Improving letter-to-pronunciation accuracy with automatic morphologically-based stress prediction", In INTERSPEECH-2004, 2573-2576.