In this paper, we present an in-depth analysis of a recently proposed method for speaker adaptation. The method involves a region-specific feature-space transformation, which we refer to as soft R-FMLLR. We argue that the method has certain difficulties, the most significant being the fact that it is non-invertible. An analysis that pertains to the singularity of the Jacobian matrix is presented, from which we note that the matrix becomes nearsingular at certain points in the feature space. It indicates that the transformation is non-invertible. We observe that under this case maximum likelihood estimation adversely affects the speech recognition performance. Moreover, sufficient statistics do not exist that makes the estimation procedure computationally very expensive. The concerns outlined above render the method to be unattractive. We propose a simple yet important modification, hard R-FMLLR, and show that the associated Jacobian matrix is assured to be full-rank, and it is computationally efficient. On a large vocabulary continuous speech recognition task the performance of the proposed method is shown to be better than soft R-FMLLR. Further, it is comparable to the widely used CMLLR with regression classes, especially when higher number of transforms are used.
Bibliographic reference. Rath, Shakti P. / Burget, Lukáš / Karafiát, Martin / Glembek, Ondřej / Černocký, Jan (2013): "A region-specific feature-space transformation for speaker adaptation and singularity analysis of jacobian matrix", In INTERSPEECH-2013, 1228-1232.