Whisper is a speech production mode normally used to protect confidential information. Given the differences in the acoustic domain, the performance of automatic speech recognition (ASR) systems decreases with whisper speech. An appealing approach to improve the performance is the use of lipreading. This study explores the use of visual features characterizing the lips' geometry and appearance to recognize digits under normal and whisper speech conditions using hidden Markov models (HMMs). We evaluate the proposed features on the digit part of the audiovisual whisper (AVW) corpus. While the proposed system achieves high accuracy in speaker dependent conditions (80.8%), the performance decreases when we evaluate speaker independent models (52.9%). We propose supervised adaptation schemes to reduce the mismatch between speakers. Across all conditions, the performance of the classifiers remain competitive even in the presence of whisper speech, highlighting the benefits of using visual features.
Bibliographic reference. Tao, Fei / Busso, Carlos (2014): "Lipreading approach for isolated digits recognition under whisper and neutral speech", In INTERSPEECH-2014, 1154-1158.