Speech Recognition and Intrinsic Variation (SRIV2006)

Toulouse, France
May 20, 2006

Underspecified feature models for pronunciation variation in ASR

Eric Fosler-Lussier

Dept. of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA

In the 1990s, several studies showed that if we could just predict correctly when to include alternate pronunciations of words in ASR lexica, we could greatly reduce error rates for conversational speech tasks (i.e., Switchboard). But it is clear that the field has thus far failed to reach that potential. Many scholars model pronunciation variation via a substitution of one phonetic sequence for another (either by replacing entries in a pronunciation lexicon, or dynamically modifying phonetic sequences in response to contextual factors such as speaking rate). In 1999, Ostendorf called for the community to move beyond the ``beads-on-a-string'' model of pronunciation, and outlined some promising directions for research. In this paper, I continue the argument that we should move away from phonetic representations, and examine how we might model phonetic variation through phonological features. By expressing pronunciation variation in terms of partially underspecified phonological feature bundles, we might be able to better model lexical access. I also review some recent approaches that integrate some of these concepts.

Full Paper
Presentation (.ppt)

Bibliographic reference.  Fosler-Lussier, Eric (2006): "Underspecified feature models for pronunciation variation in ASR", In SRIV-2006, 1-6.