5th International Conference on Spoken Language Processing
This paper describes intensity and location normalization techniques for improving the performance of visual speech recognizers used in audio-visual speech recognition. For auditory speech recognition, there exist many methods for dealing with channel characteristics and speaker individualities, e.g., CMN (cepstral mean normalization), SAT (speaker adaptive training). We present two techniques similar to CMN and SAT, respectively, for intensity and location normalization in visual speech recognition. For the intensity normalization, the mean value over the image sequence is subtracted from each pixel of the image secuence. For the location normalization, the training and the testing processes are carried out by finding the lip position with the highest likelihood of each utterance for HMMs. Word recognition experiments based on HMM show that a significant improvement in recognition performance is achieved by combining the two techniques.
Bibliographic reference. Vanegas, Oscar / Tanaka, Akiji / Tokuda, Keiichi / Kitamura, Tadashi (1998): "HMM-based visual speech recognition using intensity and location normalization", In ICSLP-1998, paper 0789.