In this paper, a self-learning speaker adaptation technique based on the separation of speech spectral variation sources is developed for improving speaker-independent continuous speech recognition. Statistical methods are formulated to remove spectral biases caused by speaker acoustic characteristics and channel mismatches and to adapt parameters of mixture Gaussian density phone models using unsupervised segmentation data from recognition feedback. Adaptation experiments demonstrate consistent performance improvements over a baseline speaker-independent continuous speech recognition system. On a TIMIT test sety where the task vocabulary size is 853 and the test set perplexity is 104, with each speaker speaking two to three sentences, the recognition word accuracy has been improved from 86.9% to 88.3% (10.7% error reduction). On a separate test set which contains a recording channel mismatch, where each speaker read 98 sentences and with a test set perplexity of 105, the recognition word accuracy has been improved from 69.3% to 85.2% (51.8% error reduction).
Bibliographic reference. Zhao, Yunxin (1993): "Self-learning speaker adaptation based on spectral variation source decomposition", In EUROSPEECH'93, 359-362.