EUROSPEECH 2003 - INTERSPEECH 2003
Very rapid speaker adaptation algorithms, such as eigenvoices or speaker clustering, typically rely on learning intra-speaker correlations of model parameters from the training data. On the base of this a-priori knowledge, many model parameters can be successfully adapted on the basis of few observations. However, eigenvoice training or speaker clustering is non-trivial with training databases containing many short speaker segments, where for each speaker the available data to detect intra-speaker correlations is sparse. We have trained eigenvoices that yield a small but significant word error rate reduction in on-line adaptation (i.e. self adaptation) for a telephony database with on average only 5 seconds of speech per speaker in training and test data.
Bibliographic reference. Kienappel, Anne K. (2003): "Learning intra-speaker model parameter correlations from many short speaker segments", In EUROSPEECH-2003, 1473-1476.