Speaker and Language Recognition -- From Laboratory Technologies to the Wild

Sriram Ganapathy


Detecting the paralinguistic components of speech like speaker and language is of substantial interest for many commercial, surveillance and security applications. The problem is at least three decades old with some of the early techniques based on simple Gaussian mixture models. A significant advancement in this area came about a decade ago with the advent of joint factor analysis and i-vector models. The last couple of years have seen further breakthroughs with deep embeddings and end-to-end models based on deep learning. With these improvements in modeling speaker and language, the application of the technology has also moved from clean controlled speech data to telephone channel recordings, far-field microphones and more recently to multi-speaker conversations in the wild. In the talk, I will provide a prospective view of the broad research directions in the field of speaker and language recognition. I will also highlight some of the recent advancements from our work on hierarchical end-to-end approaches with relevance modeling.


Cite as: Ganapathy, S. (2018) Speaker and Language Recognition -- From Laboratory Technologies to the Wild. Proc. Interspeech 2018, 3443.


@inproceedings{Ganapathy2018,
  author={Sriram Ganapathy},
  title={Speaker and Language Recognition -- From Laboratory Technologies to the Wild},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3443}
}