Sixth International Conference on Spoken Language Processing
We propose a data-driven approach to intonation modeling and generation based on discrete Hidden Markov Models (HMM), where state transitions are synchronized with Japanese rhythmic units called morae. Mora-unit F0 contours are encoded using symbols that consist of two codes: the first is an index to a table of stylized mora F0 contours, and the second points to a table of quantized differences of the average F0 contour with respect to the previous mora. Both codebooks contain 32 codes. The HMM is used in generation mode, i.e., it generates a sequence of symbols for an intonational phrase without any input other than the length of the sequence, using a variation of Viterbi search with a modified distance function. In the training phase, the speech database is subdivided into classes according to the attributes of the target accentual phrase, and each class is associated to an HMM. After the output symbol sequence is generated, the F0 contour is constructed using the codebooks and a mora duration pattern. Evaluation experiments show that the HMMs are able to correctly produce F0 contours that reflect their training conditions.
Bibliographic reference. Sakurai, Atsuhiro / Iwano, Koji / Hirose, Keikichi (2000): "Modeling and generation of accentual phrase F0 contours based on discrete HMMs synchronized at mora-unit transitions", In ICSLP-2000, vol.3, 259-262.