16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Speech Reconstruction from Human Auditory Cortex with Deep Neural Networks

Minda Yang (1), Sameer A. Sheth (2), Catherine A. Schevon (2), Guy M. McKhann II (2), Nima Mesgarani (1)

(1) Columbia University, USA
(2) Columbia University Medical Center, USA

We examined the accuracy of the reconstructed speech spectrograms from neural responses recorded intracranially in human auditory cortex. Electrodes were implanted over the cortex of epilepsy patients for the localization of seizures, and neural responses were recorded as the subjects passively listened to continuous speech. We compared the reconstructed spectrograms estimated with two different models: a linear regression model and a deep neural network. Compared with linear regression model, the reconstructed spectrograms from the deep neural network achieved a higher average correlation with the original spectrograms. In addition, the reconstructed spectrograms from the neural network better preserved the average acoustic features of phones. We further investigated how changing the number of hidden layers in the network affects the reconstruction accuracy and found a better performance with deeper networks, particularly in the reconstruction of spectrotemporal modulation content of speech. These findings reveal the efficacy of deep neural network models in decoding speech signals from neural responses and provide a method for improving the performance of brain computer interfaces with prosthetic applications.

Full Paper

Bibliographic reference.  Yang, Minda / Sheth, Sameer A. / Schevon, Catherine A. / II, Guy M. McKhann / Mesgarani, Nima (2015): "Speech reconstruction from human auditory cortex with deep neural networks", In INTERSPEECH-2015, 1121-1125.