Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network

Renuka Mannem, Jhansi Mallela, Aravind Illa, Prasanta Kumar Ghosh


In this paper, we propose a speech rate estimation approach using a convolutional dense neural network (CDNN). The CDNN based approach uses the acoustic and articulatory features for speech rate estimation. The Mel Frequency Cepstral Coefficients (MFCCs) are used as acoustic features and the articulograms representing time-varying vocal tract profile are used as articulatory features. The articulogram is computed from a real-time magnetic resonance imaging (rtMRI) video in the midsagittal plane of a subject while speaking. However, in practice, the articulogram features are not directly available, unlike acoustic features from speech recording. Thus, we use an Acoustic-to-Articulatory Inversion method using a bidirectional long-short-term memory network which estimates the articulogram features from the acoustics. The proposed CDNN based approach using estimated articulatory features requires both acoustic and articulatory features during training but it requires only acoustic data during testing. Experiments are conducted using rtMRI videos from four subjects each speaking 460 sentences. The Pearson correlation coefficient is used to evaluate the speech rate estimation. It is found that the CDNN based approach gives a better correlation coefficient than the temporal and selected sub-band correlation (TCSSBC) based baseline scheme by 81.58% and 73.68% (relative) in seen and unseen subject conditions respectively.


 DOI: 10.21437/Interspeech.2019-2295

Cite as: Mannem, R., Mallela, J., Illa, A., Ghosh, P.K. (2019) Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network. Proc. Interspeech 2019, 929-933, DOI: 10.21437/Interspeech.2019-2295.


@inproceedings{Mannem2019,
  author={Renuka Mannem and Jhansi Mallela and Aravind Illa and Prasanta Kumar Ghosh},
  title={{Acoustic and Articulatory Feature Based Speech Rate Estimation Using a Convolutional Dense Neural Network}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={929--933},
  doi={10.21437/Interspeech.2019-2295},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2295}
}