An HMM-based method of detecting prosodic word boundaries was developed for Japanese continuous speech and was successfully integrated into a mora-basis continuous speech recognition system with two stages operating without and with prosodic information. The method is based on modeling the fundamental frequency (F0) contour of input speech as transitions of mora-unit F0 contours and operates after receiving mora boundary information form the 1st stage of the recognition system. The 1st and the 2nd stages use different mora bi-gram models as their language models: one trained not taking prosodic word boundary location into account and the other taking into account. Because of perplexity reduction of the model from the 1st to the 2nd stages, an improved recognition result can be obtained from the 2nd stage. In the current paper, the method is explained with experimental results. Issues of grammar scale factor for the boundary detection and N-best scheme for the speech recognition are also included. Improvements in mora recognition rates from the 1st to the 2nd stages were observable in both speaker-closed and -open experiments.
Cite as: Hirose, K., Minematsu, N., Hashimoto, Y., Iwano, K. (2001) Continuous speech recognition of Japanese using prosodic word boundaries detected by mora transition modeling of fundamental frequency contours. Proc. ITRW on Prosody in Speech Recognition and Understanding, paper 11
@inproceedings{hirose01_prosody, author={Keikichi Hirose and Nobuaki Minematsu and Yohei Hashimoto and Koji Iwano}, title={{Continuous speech recognition of Japanese using prosodic word boundaries detected by mora transition modeling of fundamental frequency contours}}, year=2001, booktitle={Proc. ITRW on Prosody in Speech Recognition and Understanding}, pages={paper 11} }