Modeling Pronunciation Variation for Automatic Speech Recognition
Rolduc, The Netherlands
The experiment aims to verify the effects of non-native pronunciations on ASR performance. The basic CSELTs HMM, sub-word units recogniser, trained on the Italian and English phonetic sets, has been tested by three groups of bilingual subjects, respectively native speakers of Italian, English and Spanish on a vocabulary composed of 100 Italian and 100 English words. A noticeable drop in Word Accuracy was measured for production of English words by Italian subjects using the English recogniser and for production of Italian words by English subjects using the Italian recogniser. On the other hand, a small increase in error rate was observed for Spanish subjects using the Italian recogniser. The adoption of multiple phonetic transcriptions obtained by a-priori knowledge about alterations of the native pronunciations due to the influence of the native phonological system by the speaker, reduced the error rate by around 8%. Similar results were also obtained by adopting alternative transcriptions based on a posteriori information about the preferred non-native pronunciation phenomena, either obtained from the n-best variants generated by the HMM recogniser itself, or pointed out by the selection of the phonetic variants operated by a phonetic Neural Network (NN) decoder run on a development data set. Finally, a preliminary test was run using the a priori transcriptions with a multi-phonetic recogniser, trained simultaneously on a multilingual speech database consisting of Italian, English, Spanish and German utterances.
Bibliographic reference. Bonaventura, P. / Gallocchio, F. / Mari, J. / Micca, G. (1998): "Speech recognition methods for non-native pronunciation variations", In MPV-1998, 17-22.