8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

An Extension 2DPCA Based Visual Feature Extraction Method for Audio-Visual Speech Recognition

Guanyong Wu, Jie Zhu

Shanghai Jiaotong University, China

Two dimensional principal component analysis (2DPCA) has been proposed for face recognition as an alternative to traditional PCA transform [1]. In this paper, we extend this approach to the visual feature extraction for audio-visual speech recognition (AVSR). First, a two-stage 2DPCA transform is conducted to extract the visual features. Then, the visemic linear discriminant analysis (LDA) is applied for post extraction processing. We investigate the presented method comparing with traditional PCA and 2DPCA. Experimental results show that the extension 2DPCA can reduce the dimension of 2DPCA and represent the testing mouth images better than PCA does; Moreover, 2DPCA+LDA needs less computation and has a better performance than PCA+LDA in the visual-only speech recognition; Finally, further experimental results demonstrate that our AVSR system using the extension 2DPCA method provides significant enhancement of robustness in noisy environments compared to the audio-only speech recognition.

Full Paper

Bibliographic reference.  Wu, Guanyong / Zhu, Jie (2007): "An extension 2DPCA based visual feature extraction method for audio-visual speech recognition", In INTERSPEECH-2007, 714-717.