15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Robust Articulatory Speech Synthesis Using Deep Neural Networks for BCI Applications

Florent Bocquelet (1), Thomas Hueber (2), Laurent Girin (2), Pierre Badin (2), Blaise Yvert (1)

(1) Clinatec, France
(2) GIPSA, France

Brain-Computer Interfaces (BCIs) usually propose typing strategies to restore communication for paralyzed and aphasic people. A more natural way would be to use speech BCI directly controlling a speech synthesizer. Toward this goal, a prerequisite is the development a synthesizer that should i) produce intelligible speech, ii) run in real time, iii) depend on as few parameters as possible, and iv) be robust to error fluctuations on the control parameters. In this context, we describe here an articulatory-to-acoustic mapping approach based on deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded synchronously with produced speech sounds. On this corpus, the DNN-based model provided a speech synthesis quality (as assessed by automatic speech recognition and behavioral testing) comparable to a state-of-the-art Gaussian mixture model (GMM), yet showing higher robustness when noise was added to the EMA coordinates. Moreover, to envision BCI applications, this robustness was also assessed when the space covered by the 12 original articulatory parameters was reduced to 7 parameters using deep auto-encoders (DAE). Given that this method can be implemented in real time, DNN-based articulatory speech synthesis seems a good candidate for speech BCI applications.

Full Paper

Bibliographic reference.  Bocquelet, Florent / Hueber, Thomas / Girin, Laurent / Badin, Pierre / Yvert, Blaise (2014): "Robust articulatory speech synthesis using deep neural networks for BCI applications", In INTERSPEECH-2014, 2288-2292.