Modeling Pronunciation Variation for Automatic Speech Recognition
Rolduc, The Netherlands
This paper describes an approach which uses two iterative algorithms for automatically finding multiple phonetic transcriptions of words, given sample utterances of the words and an inventory of context-dependent subword units. Based on an analysis of the TV-best phonetic decoding of the available utterances of a word, the proposed approach uses a likelihood criterion for deriving the optimal phonetic transcription set (cardinal and contents) for that word. To do that, it determines a partition of the set of utterances such that each subset is associated to one of the transcription variants. As the number of transcription variants derived by the algorithms is not the same for all the words in the test corpus, we investigate the word error rate evolution as function of the mean number of variants per corpus (this number varies according to a threshold value). Speaker independent recognition results on tasks consisting of the 10 digits and of 36 French isolated words collected over the telephone (Tregor corpus) are promising.
Bibliographic reference. Mokbel, Houda / Jouvet, Denis (1998): "Derivation of the optimal phonetic transcription set for a word from its acoustic realisations", In MPV-1998, 73-78.