ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

NAM-to-speech conversion with Gaussian mixture models

Tomoki Toda, Kiyohiro Shikano

In order to realize a new human communication style using Non- Audible Murmur (NAM) that cannot be heard by people around a speaker, we perform conversion from NAM to ordinary speech (NAM-to-Speech). NAM-to-Speech has a possibility of realizing "non-speech telephone" that is a technique for communicating each other by talking in NAM and hearing in speech. In this paper, we apply a statistical conversion method with Gaussian Mixture Model (GMM) to NAM-to-Speech. In advance, we train GMMs for representing correlations between acoustic features of NAM and those of speech using 50 utterance pairs of NAM and speech. In the conversion, we estimate acoustic spectral and F0 features of speech based on a maximum likelihood criterion, and then synthesize the converted speech with a vocoder. From results of subjective evaluations on intelligibility and naturalness, it is demonstrated that the NAM-to-Speech with GMMs can convert NAM to more consistently natural voice.


doi: 10.21437/Interspeech.2005-611

Cite as: Toda, T., Shikano, K. (2005) NAM-to-speech conversion with Gaussian mixture models. Proc. Interspeech 2005, 1957-1960, doi: 10.21437/Interspeech.2005-611

@inproceedings{toda05_interspeech,
  author={Tomoki Toda and Kiyohiro Shikano},
  title={{NAM-to-speech conversion with Gaussian mixture models}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={1957--1960},
  doi={10.21437/Interspeech.2005-611}
}