CaptionAI: A Real-Time Multilingual Captioning Application

Nagendra Kumar Goel, Mousmita Sarma, Saikiran Valluri, Dharmeshkumar Agrawal, Steve Braich, Tejendra Singh Kuswah, Zikra Iqbal, Surbhi Chauhan, Raj Karbar


We demonstrate CaptionAI, the system that can be used for speech to text transcription, multilingual translation, and real-time closed captioning. It can also broadcast the audio and translated text to personal devices. There are three components of the application, namely, speech to text conversion, machine translation, and real time broadcast of audio and its multilingual text transcription. CaptionAI makes meetings, conference, and events accessible to global audience members with its real-time multilingual captioning and broadcast capabilities, improving comprehension and retention. In this application, we support English and Spanish real-time speech transcription. It also supports seventeen popular languages for real-time Machine Translation of transcribed speech. The front-end is coded on c# and in back-end we use combination of python and c++ based software and packages such as Janus, Gstreamer, and libwebsockets.


Cite as: Goel, N.K., Sarma, M., Valluri, S., Agrawal, D., Braich, S., Kuswah, T.S., Iqbal, Z., Chauhan, S., Karbar, R. (2019) CaptionAI: A Real-Time Multilingual Captioning Application. Proc. Interspeech 2019, 4632-4633.


@inproceedings{Goel2019,
  author={Nagendra Kumar Goel and Mousmita Sarma and Saikiran Valluri and Dharmeshkumar Agrawal and Steve Braich and Tejendra Singh Kuswah and Zikra Iqbal and Surbhi Chauhan and Raj Karbar},
  title={{CaptionAI: A Real-Time Multilingual Captioning Application}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={4632--4633}
}