The problem of joint compensation of environment and speaker variabilities is addressed. A factored feature-space transform, named factored front-end CMLLR (F-FE-CMLLR), is investigated, which comprises of the cascade of two transforms — front-end CMLLR for environment normalization and CMLLR for speaker normalization. In this paper, we propose an iterative estimation algorithm for F-FE-CMLLR. We believe that the iterative estimation helps to decouple the effect of the two acoustic factors, allowing each transform to learn the effect of only factor, thereby yielding an improvement in speech recognition performance compared to sequential estimation. However, it is noted that the estimation of environment transform yields full co-variance Gaussians in the GMM-HMM, which makes direct estimation computationally expensive. An efficient training algorithm is presented that helps to reduce the computational cost considerably. Further, it is shown that a row-by-row optimization procedure can be employed, which makes the algorithm more efficient and attractive. On the multi-condition Aurora 4 task and discriminatively trained GMM-HMM, it is shown that F-FE-CMLLR yields 11.6% and 8.7% relative improvements on two evaluation sets over the baseline features that is processed only by CMLLR for speaker normalization.
Bibliographic reference. Rath, Shakti / Sivadas, Sunil / Ma, Bin (2015): "Joint environment and speaker normalization using factored front-end CMLLR", In INTERSPEECH-2015, 2844-2848.