15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Lipreading Approach for Isolated Digits Recognition Under Whisper and Neutral Speech

Fei Tao, Carlos Busso

University of Texas at Dallas, USA

Whisper is a speech production mode normally used to protect confidential information. Given the differences in the acoustic domain, the performance of automatic speech recognition (ASR) systems decreases with whisper speech. An appealing approach to improve the performance is the use of lipreading. This study explores the use of visual features characterizing the lips' geometry and appearance to recognize digits under normal and whisper speech conditions using hidden Markov models (HMMs). We evaluate the proposed features on the digit part of the audiovisual whisper (AVW) corpus. While the proposed system achieves high accuracy in speaker dependent conditions (80.8%), the performance decreases when we evaluate speaker independent models (52.9%). We propose supervised adaptation schemes to reduce the mismatch between speakers. Across all conditions, the performance of the classifiers remain competitive even in the presence of whisper speech, highlighting the benefits of using visual features.

