Speaker adaptation is an essential part of any state-of-the-art Automatic Speech Recognizer (ASR). Recently, more and more application requirements appear for embedded ASR. For these cases, a more compact speech model, Subspace Distribution Clustering Hidden Markov Model (SDCHMM) is used instead of Continuous Density Hidden Markov Model (CDHMM). In previous studies on SDCHMM adaptation, the subspace Gaussian pools of SDCHMM are the parameters to be adjusted for speaker variations. Alternatively, we try to employ the link table parameters of SDCHMM, which defines the tying structure in subspaces, to model the inter-speaker mismatch, with the Gaussian parameters maintained. Since the variation range for the parameters is highly limited, this method is potentially faster than conventional Gaussian pools adaptation. Comparative study on Continuous Digital Dialing (CDD) task shows that when data is seriously insufficient, link table adaptation is more effective than conventional methods, with 17% relative improvement in utterance accuracy rate, compared to 14% improvement by previous Gaussian adaptation. However, further improvement with more data is limited. When data size doubled, this method gave 21% improvement, compared to 30% improvement by conventional method.
Cite as: Zhang, M., Xu, J. (2004) An Investigation into Subspace Rapid Speaker Adaptaion. Proc. International Symposium on Chinese Spoken Language Processing, 273-276
@inproceedings{zhang04_iscslp, author={Michael Zhang and Jun Xu}, title={{An Investigation into Subspace Rapid Speaker Adaptaion}}, year=2004, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={273--276} }