Modeling Pronunciation Variation for Automatic Speech Recognition

Rolduc, The Netherlands
May 4-6, 1998

Effects of Speaking Rate and Word Frequency on Conversational Pronunciations

Eric Fosler-Lussier, Nelson Morgan

International Computer Science Institute and University of California, Berkeley, CA, USA

The possible set of pronunciations in continuous speech corpora change dynamically with many factors. Two variables, speaking rate and word predictability, seemed to be promising candidates for integration into dynamic ASR pronunciation models; however, our initial efforts to incorporate these factors into phone-level decision tree models met with limited success. In this paper, we confirm the intuition that these factors have an effect on ASR systems, and analyze the relationship between these factors and pronunciations in order to shed light on why the decision trees models failed. We present a statistical exploration of the effects of these factors at the word, syllable, and phone level in the Switchboard corpus. We show that both increased speaking rate and word likelihood can induce a significant shift in probabilities of the pronunciations of frequent words. Using these data, we hypothesize reasons for the difficulty in incorporating these dynamic measures into phone-level decision trees.

Full Paper

Bibliographic reference.  Fosler-Lussier, Eric / Morgan, Nelson (1998): "Effects of speaking rate and word frequency on conversational pronunciations", In MPV-1998, 35-40.