Generation process model of fundamental frequency (F0) contours can well represent F0 movements of speech keeping a clear relation with back-grounding linguistic information of utterances. Therefore, by using the model, improvement of HMM-based speech synthesis is expected. One of major problems preventing the use of the model is that the performance of automatic extraction of the model parameters from observed F0 contours is still rather limited. A new method of automatic extraction was developed. Its algorithm is inspired from how humans do, and extracts phrase components first, while conventional methods extract accent component first. Also the method uses linguistic information of texts, which is the same as that used in HMM-based speech synthesis. A significant improvement of extraction is realized. Using the method, the model parameters are extracted for the speech corpus of HMM training, and F0 contours generated by the model are used for the HMM training instead of the original F0 contours. Listening experiment of synthetic speech indicates improvements in speech quality.
Index Terms: F0 contour, generation process model, speech synthesis, parameter extraction
Bibliographic reference. Hashimoto, Hiroya / Hirose, Keikichi / Minematsu, Nobuaki (2012): "Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis", In INTERSPEECH-2012, 458-461.