In standard approaches to hidden Markov model (HMM)-based speech synthesis, window coefficients for calculating dynamic features are pre-determined and fixed. This may not be optimal to capture various context-dependent dynamic characteristics in speech signals. This paper proposes a data-driven technique to estimate the window coefficients. They are optimized so as to maximize the likelihood of trajectory HMMs given data. Experimental results show that the proposed technique can achieve a comparable performance with the mean- and variance-updated trajectory HMMs in the naturalness of synthesized speech, while offering significantly lower computational cost.
Bibliographic reference. Chen, Ling-Hui / Nankaku, Yoshihiko / Zen, Heiga / Tokuda, Keiichi / Ling, Zhen-Hua / Dai, Li-Rong (2011): "Estimation of window coefficients for dynamic feature extraction for HMM-based speech synthesis", In INTERSPEECH-2011, 1801-1804.