Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya

Debadatta Dash, Myungjong Kim, Kristin Teplansky, Jun Wang


Despite the continuous progress of Automatic Speech recognition (ASR) technologies, these systems for Indian languages are still in infancy stage due to a multitude of challenges involved, including resource deficiency. This paper addressed this challenge with four Indian languages, Hindi, Marathi, Bengali and Oriya by integrating articulatory information into acoustic features, thereby compensating the low resource property of these languages for improved performance. Articulatory movements were recorded during speech production using an electromagnetic articulograph and trained together with acoustic features to build automatic speech recognizers for these languages. Both speaker-dependent and -independent recognition experiments were conducted by adopting three ASR models: Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM), Deep Neural Network (DNN)-HMM and Long Short Term Memory recurrent neural network (LSTM)-HMM. A cross-language similarity was discerned in both acoustic and articulatory domains in the pairs of Oriya-Bengali and Hindi-Marathi. Based on these observations, a multi-lingual, multi-modal speech recognizer was built by constructing a unified dictionary consisting of common and unique phonemes of all the four languages, which reduced the phoneme error rates.


 DOI: 10.21437/Interspeech.2018-2122

Cite as: Dash, D., Kim, M., Teplansky, K., Wang, J. (2018) Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya. Proc. Interspeech 2018, 1046-1050, DOI: 10.21437/Interspeech.2018-2122.


@inproceedings{Dash2018,
  author={Debadatta Dash and Myungjong Kim and Kristin Teplansky and Jun Wang},
  title={Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1046--1050},
  doi={10.21437/Interspeech.2018-2122},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2122}
}