INTERSPEECH 2008

Confusion matrices have been widely used to increase the accuracy of speech recognisers, but usually a mean confusion matrix, averaged over many speakers, is used. However, analysis shows that confusion matrices for individual speakers vary considerably, and so there is benefit in obtaining estimates of confusion matrices for individual speakers. Unfortunately, there is rarely enough data to make reliable estimates. We present a technique for estimating the elements of a speaker's confusion matrix given only sparse data from the speaker. It utilizes nonnegative matrix factorisation to find structure within confusion matrices, and this structure is exploited to make improved estimates. Results show that under certain conditions, this technique can give estimates that are as good as those obtained with twice the number of utterances available from the speaker.
Bibliographic reference. Cox, Stephen (2008): "On estimation of a speaker's confusion matrix from sparse data", In INTERSPEECH2008, 26182621.