ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Using twin-HMM-based audio-visual speech enhancement as a front-end for robust audio-visual speech recognition

Ahmed Hussen Abdelaziz, Steffen Zeiler, Dorothea Kolossa

In this paper we propose the use of the recently introduced twin-HMM-based audio-visual speech enhancement algorithm as a front-end for audio-visual speech recognition systems. This algorithm determines the clean speech statistics in the recognition domain based on the audio-visual observations and transforms these statistics to the synthesis domain through the so-called twin HMMs. The adopted front-end is used together with back-end methods like the conventional maximum likelihood decoding or the newly introduced significance decoding. The proposed combination of the front- and back-end is applied to acoustically corrupted signals of the Grid audio-visual corpus and results in statistically significant improvements of the audio-visual recognition accuracy compared to using the ETSI advanced front-end.


doi: 10.21437/Interspeech.2013-257

Cite as: Abdelaziz, A.H., Zeiler, S., Kolossa, D. (2013) Using twin-HMM-based audio-visual speech enhancement as a front-end for robust audio-visual speech recognition. Proc. Interspeech 2013, 867-871, doi: 10.21437/Interspeech.2013-257

@inproceedings{abdelaziz13_interspeech,
  author={Ahmed Hussen Abdelaziz and Steffen Zeiler and Dorothea Kolossa},
  title={{Using twin-HMM-based audio-visual speech enhancement as a front-end for robust audio-visual speech recognition}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={867--871},
  doi={10.21437/Interspeech.2013-257}
}