12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Estimation of Window Coefficients for Dynamic Feature Extraction for HMM-Based Speech Synthesis

Ling-Hui Chen (1), Yoshihiko Nankaku (2), Heiga Zen (2), Keiichi Tokuda (2), Zhen-Hua Ling (1), Li-Rong Dai (1)

(1) USTC, China
(2) Nagoya Institute of Technology, Japan

In standard approaches to hidden Markov model (HMM)-based speech synthesis, window coefficients for calculating dynamic features are pre-determined and fixed. This may not be optimal to capture various context-dependent dynamic characteristics in speech signals. This paper proposes a data-driven technique to estimate the window coefficients. They are optimized so as to maximize the likelihood of trajectory HMMs given data. Experimental results show that the proposed technique can achieve a comparable performance with the mean- and variance-updated trajectory HMMs in the naturalness of synthesized speech, while offering significantly lower computational cost.

Full Paper

Bibliographic reference.  Chen, Ling-Hui / Nankaku, Yoshihiko / Zen, Heiga / Tokuda, Keiichi / Ling, Zhen-Hua / Dai, Li-Rong (2011): "Estimation of window coefficients for dynamic feature extraction for HMM-based speech synthesis", In INTERSPEECH-2011, 1801-1804.