Speech Processing in the Human Brain Meets Deep Learning

Nima Mesgarani


Speech processing technologies have seen tremendous progress since the advent of deep learning, where the most challenging problems no longer seem out of reach. In parallel, deep learning has advanced the state-of-the-art in processing the neural signals to speech in the human brain. This talk reports progress in three important areas of research: I) Decoding (reconstructing) speech from the human auditory cortex to establish a direct interface with the brain. Such an interface not only can restore communication for paralyzed patients, but also has the potential to transform human-computer interaction technologies, II) Auditory Attention Decoding, which aims to create a mind-controlled hearing aid that can track the brain-waves of a listener to identify and amplify the voice of the attended speaker in a crowd. Such a device could help hearing-impaired listeners communicate more effortlessly with others in noisy environments, and III) More accurate models of the transformations that the brain applies to speech at different stages of the human auditory pathway. This is achieved by training deep neural networks to learn the mapping from sound to the neural responses. Using a novel method to study the exact function learned by these neural networks has led to new insights on how the human brain processes speech. On the other hand, these new insights motivate distinct computational properties that can be incorporated into the neural network models to better capture the properties of speech processing in the human auditory cortex.


Cite as: Mesgarani, N. (2018) Speech Processing in the Human Brain Meets Deep Learning. Proc. Interspeech 2018, 2206.


@inproceedings{Mesgarani2018,
  author={Nima Mesgarani},
  title={Speech Processing in the Human Brain Meets Deep Learning},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2206}
}