13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Improved Automatic Extraction of Generation Process Model Commands and Its use for Generating Fundamental Frequency Contours for Training HMM-based Speech Synthesis

Hiroya Hashimoto (1), Keikichi Hirose (1), Nobuaki Minematsu (2)

(1) Department of Information and Communication Engineering; (2) Department of Electrical Engineering and Information Systems;
the University of Tokyo, Tokyo, Japan

Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with back-grounding linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observed F0 contours is still rather limited. A new method of automatic extraction was developed. Its algorithm is inspired from how humans do, and extracts phrase components first, while conventional methods extract accent component first. Also the method uses linguistic information of texts, which is the same as that used in HMM-based speech synthesis. A significant improvement of extraction is realized. Using the method, the model parameters are extracted for the speech corpus of HMM training, and F0 contours generated by the model are used for the HMM training instead of the original F0 contours. Listening experiment of synthetic speech indicates improvements in speech quality.

Index Terms: F0 contour, generation process model, speech synthesis, parameter extraction

Full Paper

Bibliographic reference.  Hashimoto, Hiroya / Hirose, Keikichi / Minematsu, Nobuaki (2012): "Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis", In INTERSPEECH-2012, 458-461.