In view of designing a speaker-independent large vocabulary recognition system, we evaluate a vector quantization approach for speaker adaptation.
Only one speaker (the reference speaker) pronounces the application vocabulary. He also pronounces a small vocabulary called the adaptation vocabulary. Each new speaker then merely pronounces the adaptation vocabulary.
We have compared two adaptation methods, establishing a correspondence between the codebooks of the reference and the new speakers, on a 20-speaker database with a 104-word adaptation vocabulary. Method I uses a transposed codebook to represent the new speaker during the recognition process, whereas Method II uses a codebook which is obtained by clustering analysis on the NS's pronunciation of the adaptation vocabulary. The adaptation vocabulary contains 136 words. Comparison of performance of the two methods shows that a new speaker's codebook is not necessary to represent the new speaker. Consequently we have used the first method to perform tests with a 5000-word application vocabulary, and a 4-speaker database. The adaptation is still efficient (the mean improvement is about 14%), even if the relative improvement is 30% compared to 56% obtained in the 104-word application experiment. Further experiments show that the recognition accuracy can be improved by increasing the adaptation vocabulary size and the codebook size.
Cite as: Bonneau-Maynard, H. (1990) Vector quantization for speaker adaptation: results on a 5000-word database. Proc. ESCA Workshop on Speaker Characterization in Speech Technology, 66-71
@inproceedings{bonneaumaynard90_scst, author={H. Bonneau-Maynard}, title={{Vector quantization for speaker adaptation: results on a 5000-word database}}, year=1990, booktitle={Proc. ESCA Workshop on Speaker Characterization in Speech Technology}, pages={66--71} }