ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

Spanish Lipreading in Realistic Scenarios: the LLEER project

Carlos David Martinez Hinarejos, David Gimeno-Gomez, Francisco Casacuberta, Emilio Granell, Roberto Paredes, Moisés Pastor, Enrique Vidal

Automatic speech recognition has been usually performed by using only the audio data, but speech communication is affected as well by other non-audio sources, mainly visual cues. Visual information includes body expression, face expression, and lip movements, among other. Lip reading, also known as Visual Speech Recognition, aims at decoding speech by only using the image of the lip movements. Current approaches for automatic lip reading follow the same lines than for speech processing: use of massive data for training deep learning models that allow to perform speech recognition. However, most of the datasets and models are devoted to languages such as English or Chinese, while other languages, particularly Spanish, are underrepresented. The LLEER (Lectura de Labios en Espa˜nol en Escenarios Realistas) project aims at the acquisition of largescale visual corpora for Spanish lip reading, the development of visual processing techniques that allow to extract important information for the task, the implementation of models for automatic lip reading, and the integration with speech recognition models for audiovisual speech recognition.

doi: 10.21437/IberSPEECH.2022-49

Cite as: Martinez Hinarejos, C.D., Gimeno-Gomez, D., Casacuberta, F., Granell, E., Paredes, R., Pastor, M., Vidal, E. (2022) Spanish Lipreading in Realistic Scenarios: the LLEER project. Proc. IberSPEECH 2022, 241-245, doi: 10.21437/IberSPEECH.2022-49

  author={Carlos David {Martinez Hinarejos} and David Gimeno-Gomez and Francisco Casacuberta and Emilio Granell and Roberto Paredes and Moisés Pastor and Enrique Vidal},
  title={{Spanish Lipreading in Realistic Scenarios: the LLEER project}},
  booktitle={Proc. IberSPEECH 2022},