An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement

Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, Hirokazu Kameoka, Tomoki Toda


Despite abundance of research, natural voice restoration after total laryngectomy (i. e., removal of the vocal folds of the larynx), has remained a challenge. A typical way of producing a relatively intelligible speech for patients suffering from this inability is to use an electrolarynx. However, the outcome voice sounds artificial and has “robotic” quality owing to constant fundamental frequency (F0 ) patterns generated by the electrolarynx. In existing frameworks on natural F0 patterns prediction, a model is trained on a massive amount of parallel training data to find a mapping that maps spectral features of the source speech into F0 contours of the target speech. However, creating big datasets for electrolaryngeal (EL) speech is considered as a cumbersome and expensive task. Moreover, EL speech spectral features are significantly different from spectral features of the normal speech, and therefore, it is not straightforward to effectively use easily available normal speech datasets in training of the model for EL speech. Consequently, the quality of the models could be still low due to the lack of sufficient training data. To address this problem, we investigate F0 pattern prediction based on other features that could be shared between normal speech and EL speech. By using shared input features, we would be to train the prediction model using a large amount of training data. As such features, in this work, we examine F0 prediction accuracy based on phoneme-related features. The findings show that by considering phoneme labels for both vowels and consonants and one-hot encoding of these labels, we are able to predict F0 contours with high correlation coefficients.


 DOI: 10.21437/SSW.2019-45

Cite as: Eshghi, M., Tanaka, K., Kobayashi, K., Kameoka, H., Toda, T. (2019) An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement. Proc. 10th ISCA Speech Synthesis Workshop, 251-256, DOI: 10.21437/SSW.2019-45.


@inproceedings{Eshghi2019,
  author={Mohammad Eshghi and Kou Tanaka and Kazuhiro Kobayashi and Hirokazu Kameoka and Tomoki Toda},
  title={{An Investigation of Features for Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement}},
  year=2019,
  booktitle={Proc. 10th ISCA Speech Synthesis Workshop},
  pages={251--256},
  doi={10.21437/SSW.2019-45},
  url={http://dx.doi.org/10.21437/SSW.2019-45}
}