Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

NAM-to-Speech Conversion with Gaussian Mixture Models

Tomoki Toda, Kiyohiro Shikano

Nara Institute of Science and Technology, Japan

In order to realize a new human communication style using Non- Audible Murmur (NAM) that cannot be heard by people around a speaker, we perform conversion from NAM to ordinary speech (NAM-to-Speech). NAM-to-Speech has a possibility of realizing "non-speech telephone" that is a technique for communicating each other by talking in NAM and hearing in speech. In this paper, we apply a statistical conversion method with Gaussian Mixture Model (GMM) to NAM-to-Speech. In advance, we train GMMs for representing correlations between acoustic features of NAM and those of speech using 50 utterance pairs of NAM and speech. In the conversion, we estimate acoustic spectral and F0 features of speech based on a maximum likelihood criterion, and then synthesize the converted speech with a vocoder. From results of subjective evaluations on intelligibility and naturalness, it is demonstrated that the NAM-to-Speech with GMMs can convert NAM to more consistently natural voice.

Full Paper

Bibliographic reference.  Toda, Tomoki / Shikano, Kiyohiro (2005): "NAM-to-speech conversion with Gaussian mixture models", In INTERSPEECH-2005, 1957-1960.