ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Structural maximum a-posteriori linear regression for unsupervised speaker adaptation

Tor André Myrvoll, Olivier Siohan, Chin-Hui Lee, Wu Chou

In this paper we introduce an approach to transformation based model adaptation techniques. Previously published schemes like MLLR define a set of affine transformations to be applied on clusters of model parameters. Although it has been shown that this approach can yield good results when adaptation data is scarce, an inherent problem needs to be considered: the number of transformations used has a significant influence on the adaptation performance. Using too many transformations will result in poorly estimated transformation parameters, eventually leading to a model that overfits the adaptation data. On the other hand, when too few transformations are used, a restricted mapping is obtained, leading to a suboptimal adapted model. We address this problem by estimating the transform parameters in a maximum a posteriori sense, using a set of hierarchical priors arranged in a tree structure. We show that this approach yields a significant improvement compared to MLLR when doing unsupervised model adaptation on the WSJ spoke 3 test.


Cite as: Myrvoll, T.A., Siohan, O., Lee, C.-H., Chou, W. (2000) Structural maximum a-posteriori linear regression for unsupervised speaker adaptation. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 540-543

@inproceedings{myrvoll00_icslp,
  author={Tor André Myrvoll and Olivier Siohan and Chin-Hui Lee and Wu Chou},
  title={{Structural maximum a-posteriori linear regression for unsupervised speaker adaptation}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 540-543}
}