15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Articulatory Controllable Speech Modification Based on Statistical Feature Mapping with Gaussian Mixture Models

Patrick Lumban Tobing (1), Tomoki Toda (1), Graham Neubig (1), Sakriani Sakti (1), Satoshi Nakamura (1), Ayu Purwarianti (2)

(1) NAIST, Japan
(2) Institut Teknologi Bandung, Indonesia

This paper presents a novel speech modification method capable of controlling unobservable articulatory parameters based on a statistical feature mapping technique with Gaussian Mixture Models (GMMs). In previous work [1], the GMM-based statistical feature mapping was successfully applied to acoustic-to-articulatory inversion mapping and articulatory-to-acoustic production mapping separately. In this paper, these two mapping frameworks are integrated to a unified framework to develop a novel speech modification system. The proposed system sequentially performs the inversion and the production mapping, making it possible to modify phonemic sounds of an input speech signal by intuitively manipulating articulatory parameters estimated from the input speech signal. We also propose a manipulation method to automatically compensate for unmodified articulatory movements considering inter-dimensional correlation of the articulatory parameters. The proposed system is implemented for a single English speaker and its effectiveness is evaluated experimentally. The experimental results demonstrate that the proposed system is capable of modifying phonemic sounds by manipulating the estimated articulatory movements and higher speech quality is achieved by considering the inter-dimensional correlation in the manipulation.


  1. Toda, T., Black, A. W., and Tokuda, K., “Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model,” Speech Communication, Vol. 50, No. 3, pp. 215–227, Mar. 2008.

Full Paper

Bibliographic reference.  Tobing, Patrick Lumban / Toda, Tomoki / Neubig, Graham / Sakti, Sakriani / Nakamura, Satoshi / Purwarianti, Ayu (2014): "Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models", In INTERSPEECH-2014, 2298-2302.