INTERSPEECH 2014
15th Annual Conference of the International Speech Communication Association

Singapore
September 14-18, 2014

Unsupervised Model Selection for Recognition of Regional Accented Speech

Maryam Najafian (1), Andrea DeMarco (2), Stephen Cox (2), Martin Russell (1)

(1) University of Birmingham, UK
(2) University of East Anglia, UK

This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker's accent and select an accent-dependent acoustic model? Three accent-based model selection methods are investigated: using the `true' accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs better than unsupervised speaker adaptation, even if the latter uses five times more adaptation data. Combining unsupervised AID-based model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.

Full Paper

Bibliographic reference.  Najafian, Maryam / DeMarco, Andrea / Cox, Stephen / Russell, Martin (2014): "Unsupervised model selection for recognition of regional accented speech", In INTERSPEECH-2014, 2967-2971.