Sixth International Conference on Spoken Language Processing
When the user has an accent different from what the automatic speech recognization system is trained with, the performance of the systems degrades. This is attributed to both acoustic and phonological differences between accents. The phonological differences between two accents are due to different phoneme inventories in two languages. Even for the same phoneme, foreigners and native speakers pronounce different sounds. Since accented data is rare but monolingual data is abundant, we propose using the accented speakerís first language data directly instead of accented data in the second language for our purpose. We propose adapting the native English phoneme models to accented phoneme models using first language data in MLLR adaptation. The baseline performance is 35.25% (phone accuracy) in using native English phone models to recognize Cantoneseaccented English speech data. We compare accent adaptation by using accented data and source language data. On the average, using accented data for adaptation improves the phone accuracy by 69.98% while using source language data for adaptation improves the phone accuracy by 70.13%. This shows that both kinds of adaptation data give similar improvements. Therefore non-accented data can be used for adaptation. We can rapidly obtain an accent-adapted acoustic model without the need of collecting accented database.
Bibliographic reference. Liu, Wai Kat / Fung, Pascale (2000): "MLLR-based accent model adaptation without accented data", In ICSLP-2000, vol.3, 738-741.