ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

An extension 2DPCA based visual feature extraction method for audio-visual speech recognition

Guanyong Wu, Jie Zhu

Two dimensional principal component analysis (2DPCA) has been proposed for face recognition as an alternative to traditional PCA transform [1]. In this paper, we extend this approach to the visual feature extraction for audio-visual speech recognition (AVSR). First, a two-stage 2DPCA transform is conducted to extract the visual features. Then, the visemic linear discriminant analysis (LDA) is applied for post extraction processing. We investigate the presented method comparing with traditional PCA and 2DPCA. Experimental results show that the extension 2DPCA can reduce the dimension of 2DPCA and represent the testing mouth images better than PCA does; Moreover, 2DPCA+LDA needs less computation and has a better performance than PCA+LDA in the visual-only speech recognition; Finally, further experimental results demonstrate that our AVSR system using the extension 2DPCA method provides significant enhancement of robustness in noisy environments compared to the audio-only speech recognition.


doi: 10.21437/Interspeech.2007-297

Cite as: Wu, G., Zhu, J. (2007) An extension 2DPCA based visual feature extraction method for audio-visual speech recognition. Proc. Interspeech 2007, 714-717, doi: 10.21437/Interspeech.2007-297

@inproceedings{wu07d_interspeech,
  author={Guanyong Wu and Jie Zhu},
  title={{An extension 2DPCA based visual feature extraction method for audio-visual speech recognition}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={714--717},
  doi={10.21437/Interspeech.2007-297}
}