7^{th} International Conference on Spoken Language ProcessingSeptember 16-20, 2002 |
We describe a new algorithm for estimating eigenvoices (or, equivalently, EMAP correlation matrices) for large vocabulary speech recognition tasks. The algorithm is an EM procedure based on a novel maximum likelihood formulation of the estimation problem which is similar to the mathematical model underlying probabilistic principal components analysis. It enables us to extend eigenvoice/EMAP adaptation in a natural way to adapt variances as well as mean vectors. It differs from other approaches in that it does not require that speaker dependent or speaker adapted models for the training speakers be given in advance (these are derived as a byproduct of the estimation procedure). Accordingly our algorithm can be applied directly to large vocabulary tasks even if the training data is sparse in the sense that only a small fraction of the total number of Gaussians is observed for each training speaker.
Bibliographic reference. Kenny, P. / Boulianne, G. / Dumouchel, Pierre (2002): "Maximum likelihood estimation of eigenvoices and residual variances for large vocabulary speech recognition tasks", In ICSLP-2002, 57-60.