INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Speech Generation from Hand Gestures Based on Space Mapping

Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose

University of Tokyo, Japan

Individuals with speaking disabilities, particularly people suffering from dysarthria, often use a TTS synthesizer for speech communication. Since users always have to type sound symbols and the synthesizer reads them out in a monotonous style, the use of the current synthesizers usually renders real-time operation and lively communication difficult. This is why dysarthric users often fail to control the flow of conversation. In this paper, we propose a novel speech generation framework which makes use of hand gestures as input. People usually use tongue gesture transitions for speech generation but we develop a special glove, by wearing which, speech sounds are generated from hand gesture transitions. For development, GMM-based voice conversion techniques (mapping techniques) are applied to estimate a mapping function between a space of hand gestures and another space of speech sounds. In this paper, as an initial trial, a mapping between hand gestures and Japanese vowel sounds is estimated so that topological features of the selected gestures in a feature space and those of the five Japanese vowels in a cepstrum space are equalized. Experiments show that the special glove can generate good Japanese vowel transitions with voluntary control of duration and articulation.

Full Paper

Bibliographic reference.  Kunikoshi, Aki / Qiao, Yu / Minematsu, Nobuaki / Hirose, Keikichi (2009): "Speech generation from hand gestures based on space mapping", In INTERSPEECH-2009, 308-311.