Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition

Vishwas M. Shetty, Rini A Sharon, Basil Abraham, Tejaswi Seeram, Anusha Prakash, Nithya Ravi, S. Umesh


In this paper, we discuss the benefits of using articulatory and stacked bottleneck features (SBF) for low resource speech recognition. Articulatory features (AF) which capture the underlying attributes of speech production are found to be robust to channel and speaker variations. However, building an efficient articulatory classifier to extract AF requires an enormous amount of data. In low resource acoustic modeling, we propose to train the bidirectional long short-term memory (BLSTM) articulatory classifier by pooling data from the available low resource Indian languages, namely, Gujarati, Tamil and Telugu. This is done in the context of Microsoft Indian Language challenge. Similarly, we train a multilingual bottleneck feature extractor and an SBF extractor using the pooled data. To bias, the SBF network towards the target language, a second network in the stacked architecture was trained using the target language alone. The performance of ASR system trained with stand-alone AF is observed to be at par with the multilingual bottleneck features. When the AF and the biased SBF are appended, they are found to outperform the conventional filterbank features in the multilingual deep neural network (DNN) framework and the high-resolution Mel frequency cepstral coefficient (MFCC) features in the time-delayed neural network(TDNN) framework.


 DOI: 10.21437/Interspeech.2018-2226

Cite as: Shetty, V.M., Sharon, R.A., Abraham, B., Seeram, T., Prakash, A., Ravi, N., Umesh, S. (2018) Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition. Proc. Interspeech 2018, 3202-3206, DOI: 10.21437/Interspeech.2018-2226.


@inproceedings{Shetty2018,
  author={Vishwas M. Shetty and Rini A Sharon and Basil Abraham and Tejaswi Seeram and Anusha Prakash and Nithya Ravi and S. Umesh},
  title={Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3202--3206},
  doi={10.21437/Interspeech.2018-2226},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2226}
}