This paper presents a novel approach to acoustic model adaptation of a recognizer for non-native spontaneous speech in the context of recognizing candidates’ responses in a test of spoken English. Instead of collecting and then transcribing spontaneous speech data, a read speech corpus is created where non-native speakers of English read English sentences of different degrees of pronunciation difficulty with respect to their native language. The motivation for this approach is (1) to save time and cost associated with transcribing spontaneous speech, and (2) to allow for a targeted training of the recognizer, focusing particularly on those phoneme environments which are difficult to pronounce correctly by non-native speakers and hence have a higher likelihood of being misrecognized. As a criterion for selecting the sentences to be read, we develop a novel score, the “phonetic challenge score”, consisting of a measure for native language-specific difficulties described in the second-language acquisition literature and also of a statistical measure based on the cross-entropy between phoneme sequences of the native language and English.
We collected about 23,000 read sentences from 200 speakers in four language groups: Chinese, Japanese, Korean, and Spanish. We used this data for acoustic model adaptation of a spontaneous speech recognizer and compared recognition performance between the unadapted baseline and the system after adaptation on a held-out set from the English test responses data set.
The results show that using this targeted read speech material for acoustic model adaptation does reduce the word error rate significantly for two of four language groups of the spontaneous speech test set, while changes of the two other language groups are not significant.
Bibliographic reference. Zechner, Klaus / Higgins, Derrick / Lawless, René / Futagi, Yoko / Ohls, Sarah / Ivanov, George (2009): "Adapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty", In INTERSPEECH-2009, 604-607.