Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Structural Maximum A-Posteriori Linear Regression for Unsupervised Speaker Adaptation

Tor André Myrvoll (1), Olivier Siohan, Chin-Hui Lee, Wu Chou

Multimedia Communications Research Lab Bell Laboratories - Lucent Technologies, Murray Hill, NJ, USA
(1)This work was done while T. A. Myrvoll was on leave from the Department of Telecommunications, Norwegian University of Science and Technology, Norway.

In this paper we introduce an approach to transformation based model adaptation techniques. Previously published schemes like MLLR define a set of affine transformations to be applied on clusters of model parameters. Although it has been shown that this approach can yield good results when adaptation data is scarce, an inherent problem needs to be considered: the number of transformations used has a significant influence on the adaptation performance. Using too many transformations will result in poorly estimated transformation parameters, eventually leading to a model that overfits the adaptation data. On the other hand, when too few transformations are used, a restricted mapping is obtained, leading to a suboptimal adapted model. We address this problem by estimating the transform parameters in a maximum a posteriori sense, using a set of hierarchical priors arranged in a tree structure. We show that this approach yields a significant improvement compared to MLLR when doing unsupervised model adaptation on the WSJ spoke 3 test.

Full Paper

Bibliographic reference.  Myrvoll, Tor André / Siohan, Olivier / Lee, Chin-Hui / Chou, Wu (2000): "Structural maximum a-posteriori linear regression for unsupervised speaker adaptation", In ICSLP-2000, vol.4, 540-543.