Modeling Pronunciation Variation for Automatic Speech Recognition

Rolduc, The Netherlands
May 4-6, 1998

Dynamic and Static Improvements to Lexical Baseforms

Richard Wiseman, Simon Downey

Speech Technology Unit, BT Laboratories, Ipswich, Suffolk, UK

One limitation of many speaker independent recognition systems is their dependence on a single-baseform dictionary to model word pronunciations. This paper investigates two approaches to improve lexical baseforms. In the first, 'ideal' transcriptions of utterances are looked up in a pronunciation dictionary and are compared to phonetic level hand-annotated transcriptions. The differences between the two transcriptions reveal many common mispronunciations, accent-based alternatives, false-starts and incorrect word substitutions. The second approach applies phonologically developed rules and transforms to the lexical representation of the utterance, generating a pronunciation network. This approach has the advantage of being able to explicitly model cross-word coarticulation effects, whereas the former approach models them implicitly to a certain extent. The relative merits of each technique are investigated using a set of experiments performed on a phonetically rich data- base and the WSJCamO corpus.

Full Paper

Bibliographic reference.  Wiseman, Richard / Downey, Simon (1998): "Dynamic and static improvements to lexical baseforms", In MPV-1998, 157-162.