International Conference on Auditory-Visual Speech Processing 2008

Tangalooma Wild Dolphin Resort, Moreton Island, Queensland, Australia
September 26-29, 2008

CENSREC-AV: Evaluation frameworks for Audio-Visual Speech Recognition

Satoshi Tamura (1), Chiyomi Miyajima (2), Norihide Kitaoka (2), Satoru Hayamizu (1), Kazuya Takeda (2)

(1) Department of Information Science, Gifu University, Japan (2) Graduate School of Information Science, Nagoya University, Japan

This paper introduces incoming evaluation frameworks for bimodal speech recognition in noisy conditions and real environments. In order to develop a robust speech recognition in noisy environments, bimodal speech recognition which uses acoustic and visual information has been paid attention to particularly for this decade. As a lot of methods and techniques for bimodal speech recognition have been proposed, a common evaluation framework, including audio-visual speech data and baseline system, is needed to estimate and compare these techniques and bimodal speech recognition schemes. Audio-visual evaluation frameworks, CENSREC-1-AV and CENSREC-2-AV, have been being built by the CENSREC project in Japan; CENSREC- 1-AV includes artificially noise-added waveforms and image sequences, whereas CENSREC-2-AV consists of audio-visual data recorded in in-car environments. A baseline method and its recognition results will be also provided with these corpora. Index Terms: evaluation framework, audio-visual speech corpus, bimodal speech recognition, noisy environments.

