INTERSPEECH 2004 - ICSLP
In this paper, we describe audio-visual Korean vowel recognition experiments by using a limited set of lip features. We propose the lip features extracted from a snapshot image, for each vowel speech, when the speaker's mouth reaches maximum variation compared with its closing state. By using the only one snapshot image, the proposed lip features can be obtained in a simple and cost effective way. For the devices having limited computing power such as PDA or smart phone, easy and cost effective visual feature extraction is very important. We also present a N-best rescoring method to correct Korean vowel speech recognition errors. The experimental results show that the proposed N-best rescoring method and the selected lip features are very effective on audio-visual Korean vowel speech recognition.
Bibliographic reference. Hong, Ki-Hyung / Lee, Yong-Ju / Suh, Jae-Young / Lee, Kyong-Nim (2004): "Correcting Korean vowel speech recognition errors with limited lip features", In INTERSPEECH-2004, 2529-2532.