ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

MLLR-based accent model adaptation without accented data

Wai Kat Liu, Pascale Fung

When the user has an accent different from what the automatic speech recognization system is trained with, the performance of the systems degrades. This is attributed to both acoustic and phonological differences between accents. The phonological differences between two accents are due to different phoneme inventories in two languages. Even for the same phoneme, foreigners and native speakers pronounce different sounds. Since accented data is rare but monolingual data is abundant, we propose using the accented speaker’s first language data directly instead of accented data in the second language for our purpose. We propose adapting the native English phoneme models to accented phoneme models using first language data in MLLR adaptation. The baseline performance is 35.25% (phone accuracy) in using native English phone models to recognize Cantoneseaccented English speech data. We compare accent adaptation by using accented data and source language data. On the average, using accented data for adaptation improves the phone accuracy by 69.98% while using source language data for adaptation improves the phone accuracy by 70.13%. This shows that both kinds of adaptation data give similar improvements. Therefore non-accented data can be used for adaptation. We can rapidly obtain an accent-adapted acoustic model without the need of collecting accented database.


Cite as: Liu, W.K., Fung, P. (2000) MLLR-based accent model adaptation without accented data. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 738-741

@inproceedings{liu00h_icslp,
  author={Wai Kat Liu and Pascale Fung},
  title={{MLLR-based accent model adaptation without accented data}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 738-741}
}