First International Conference on Spoken Language Processing (ICSLP 90)
In this paper we describe a method of rapid speaker adaptation that uses speech from multiple reference speakers to improve performance for large vocabulary continuous speech recognition. This method is an extension of our previous work in which we estimated a speaker transformation between a single reference speaker and the new (target) speaker based on a small sample of speech of each speaker. The transformation is applied to the parameters of a speaker-dependent (SD) phonetic hidden Markov model (HMM) made for the reference speaker to make an adapted model for the target. In the present work, we estimate multiple independent transformations between a set of reference speakers and a single target speaker and then combine the resulting adapted models. We have tested this approach on the DARPA 1000-word Resource Management continuous speech corpus using the standard word-pair grammar of perplexity 60. We used 30 minutes of speech from each of 11 reference speakers to train the reference HMMs. Using 2 minutes of adaptation speech from the target speakers, the average recognition performance is 4.1% word error. This error rate is nearly 40% less than that achieved by adaptation from a single reference speaker.
Bibliographic reference. Kubala, Francis / Schwartz, Richard (1990): "Improved speaker adaptation using multiple reference speakers", In ICSLP-1990, 153-156.