This paper proposes a novel method of visual feature extraction for automatic speechreading. While current methods of extracting delta or difference features involves computing the difference between adjacent frames, this method proposed provides information on how the visual features evolve over a time period longer than the time period between adjacent frames, the time period being relative to the length of the utterance. These new features provide a visual memory capability for improved system performance. Good visual discrimination is achieved by maintaining a base level of detail in the features. A frame rate of 30 frames per second provides rapid visual recognition of speech. The combination of the novel visual memory features, good visual discrimination and rapid visual recognition of speech movements is shown to improve visual speech recognition. Using this method an isolated word accuracy of 28.1% for a vocabulary 78 words over a database of 10 speakers was achieved.
Cite as: Scanlon, P., Reilly, R., Chazal, P.d. (2003) Visual feature analysis for automatic speechreading. Proc. Auditory-Visual Speech Processing, 127-132
@inproceedings{scanlon03_avsp, author={Patricia Scanlon and Richard Reilly and Philip de Chazal}, title={{Visual feature analysis for automatic speechreading}}, year=2003, booktitle={Proc. Auditory-Visual Speech Processing}, pages={127--132} }