ETRW on Speaker Characterization in Speech Technology

Edinburgh, Scotland, UK
June 26-28, 1990

Vector Quantization for Speaker Adaptation: Results on a 5000-Word Database

H. Bonneau-Maynard

LIMSI/CNRS, Orsay, France

In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach for speaker adaptation.

Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary.

We have compared two adaptation methods, establishing a correspondence between the codebooks of the reference and the new speakers, on a 20-speaker database with a 104-word adaptation vocabulary. Method I uses a transposed codebook to represent the new speaker during the recognition process, whereas Method II uses a codebook which is obtained by clustering analysis on the NS's pronunciation of the adaptation vocabulary. The adaptation vocabulary contains 136 words. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker. Consequently we have used the first method to perform tests with a 5000-word application vocabulary, and a 4-speaker database. The adaptation is still efficient (the mean improvement is about 14%), even if the relative improvement is 30% compared to 56% obtained in the 104-word application experiment. Further experiments show that the recognition accuracy can be improved by increasing the adaptation vocabulary size and the codebook size.

Full Paper

Bibliographic reference.  Bonneau-Maynard, H. (1990): "Vector quantization for speaker adaptation: results on a 5000-word database", In SCST-1990, 66-71.