16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Articulatory Controllable Speech Modification Based on Gaussian Mixture Models with Direct Waveform Modification Using Spectrum Differential

Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

NAIST, Japan

In our previous work, we have developed a speech modification system capable of manipulating unobserved articulatory movements by sequentially performing speech-to-articulatory inversion mapping and articulatory-to-speech production mapping based on a Gaussian mixture model (GMM)-based statistical feature mapping technique. One of the biggest issues to be addressed in this system is quality degradation of the synthetic speech caused by modeling and conversion errors in a vocoder-based waveform generation framework. To address this issue, we propose several implementation methods of direct waveform modification. The proposed methods directly filter an input speech waveform with a time sequence of spectral differential parameters calculated between unmodified and modified spectral envelop parameters in order to avoid using vocoder-based excitation signal generation. The experimental results show that the proposed direct waveform modification methods yield significantly larger quality improvements in the synthetic speech while also keeping a capability of intuitively modifying phoneme sounds by manipulating the unobserved articulatory movements.

Full Paper

Bibliographic reference.  Tobing, Patrick Lumban / Kobayashi, Kazuhiro / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi (2015): "Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential", In INTERSPEECH-2015, 3350-3354.