ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

On invariant structural representation for speech recognition: theoretical validation and experimental improvement

Yu Qiao, Nobuaki Minematsu, Keikichi Hirose

One of the most challenging problems in speech recognition is to deal with inevitable acoustic variations caused by non-linguistic factors. Recently, an invariant structural representation of speech was proposed [1], where the non-linguistic variations are effectively removed though modeling the dynamic and contrastive aspects of speech signals. This paper describes our recent progresses on this problem. Theoretically, we prove that the maximum likelihood based decomposition can lead to the same structural representations for a sequence and its transformed version. Practically, we introduce a method of discriminant analysis of eigen-structure to deal with two limitations of structural representations, namely, high dimensionality and too strong invariance. In the 1st experiment, we evaluate the proposed method through recognizing connected Japanese vowels. The proposed method achieves a recognition rate 99.0%, which is higher than those of the previous structure based recognition methods [2, 3, 4] and word HMM. In the 2nd experiment, we examine the recognition performance of structural representations to vocal tract length (VTL) differences. The experimental results indicate that structural representations have much more robustness to VTL changes than HMM. Moreover, the proposed method is about 60 times faster than the previous ones.


doi: 10.21437/Interspeech.2009-567

Cite as: Qiao, Y., Minematsu, N., Hirose, K. (2009) On invariant structural representation for speech recognition: theoretical validation and experimental improvement. Proc. Interspeech 2009, 3055-3058, doi: 10.21437/Interspeech.2009-567

@inproceedings{qiao09_interspeech,
  author={Yu Qiao and Nobuaki Minematsu and Keikichi Hirose},
  title={{On invariant structural representation for speech recognition: theoretical validation and experimental improvement}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={3055--3058},
  doi={10.21437/Interspeech.2009-567}
}