Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech

Nagendra Goel, Mousmita Sarma, Tejendra Kushwah, Dharmesh Agarwal, Zikra Iqbal, Surbhi Chauhan


We demonstrate a speaker characteristics assessment solution to extract speaker’s information like gender, age, emotion, language and accent from telephone quality speech. The solution has been designed using machine learning algorithms ranging from Gaussian mixture models to deep neural networks and utilize websocket technology for real-time bidirectional interface to provide live updates in a scalable manner. The service is utilized on our demonstration web-page where user can upload or record audio file and obtain the speaker’s characteristics. Such speaker characteristics information can be used as metadata in many real life applications designed for an emotionally sensitive human to machine interaction and human to human interaction.


Cite as: Goel, N., Sarma, M., Kushwah, T., Agarwal, D., Iqbal, Z., Chauhan, S. (2018) Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech. Proc. Interspeech 2018, 2384-2385.


@inproceedings{Goel2018,
  author={Nagendra Goel and Mousmita Sarma and Tejendra Kushwah and Dharmesh Agarwal and Zikra Iqbal and Surbhi Chauhan},
  title={Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2384--2385}
}