This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker's accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the `true' accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter uses five times more adaptation data. Combining unsupervised AID-based model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.
Bibliographic reference. Najafian, Maryam / DeMarco, Andrea / Cox, Stephen / Russell, Martin (2014): "Unsupervised model selection for recognition of regional accented speech", In INTERSPEECH-2014, 2967-2971.