9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Decomposition of Rotational Distortion Caused by VTL Difference Using Eigenvalues of Its Transformation Matrix

Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose

University of Tokyo, Japan

In speech recognition studies, vocal tract length normalization (VTLN) techniques are widely used to cancel age- and genderdifference. In VTLN, the distortion is often modeled as a linear transform in a cepstrum space; .c= Ac. In our previous study, the geometrical properties of A were discussed and it was shown that the matrix can be approximated as rotation matrix. In this study, a new method of better approximating A is proposed. Using eigenvalues of A, its quasi-rotational distortion is factorized into multiple rotation operations and multiple magnification operations. Using this method, the intrinsic ambiguity of the rotation angle used in our previous study is resolved. Instead, multiple rotation angles are introduced to understand better what kind of geometrical distortions A induces to cepstrum vectors. Experiments show the validity of the new method and a new speech feature is also derived by the new method.

Full Paper

Bibliographic reference.  Saito, Daisuke / Minematsu, Nobuaki / Hirose, Keikichi (2008): "Decomposition of rotational distortion caused by VTL difference using eigenvalues of its transformation matrix", In INTERSPEECH-2008, 1361-1364.