![]() |
Modeling Pronunciation Variation for Automatic Speech RecognitionRolduc, The Netherlands |
![]() ![]() |
One limitation of many speaker independent recognition systems is their dependence on a single-baseform dictionary to model word pronunciations. This paper investigates two approaches to improve lexical baseforms. In the first, 'ideal' transcriptions of utterances are looked up in a pronunciation dictionary and are compared to phonetic level hand-annotated transcriptions. The differences between the two transcriptions reveal many common mispronunciations, accent-based alternatives, false-starts and incorrect word substitutions. The second approach applies phonologically developed rules and transforms to the lexical representation of the utterance, generating a pronunciation network. This approach has the advantage of being able to explicitly model cross-word coarticulation effects, whereas the former approach models them implicitly to a certain extent. The relative merits of each technique are investigated using a set of experiments performed on a phonetically rich data- base and the WSJCamO corpus.
Bibliographic reference. Wiseman, Richard / Downey, Simon (1998): "Dynamic and static improvements to lexical baseforms", In MPV-1998, 157-162.