Third International Conference on Spoken Language Processing (ICSLP 94)
This paper describes a prosodic method for segmenting continuous speech into accent phrases. Optimum sequences are obtained on the basis of least squared error criterion by using dynamic time warping between F0 contours of input speech and reference accent patterns called 'pitch pattern templates'. But the optimum sequence does not always give good agreement with phrase boundaries labeled by hand, while the second or the third optimum candidate sequence does well. Therefore, we expand our system to be able to find out multiple candidates by using N-best algorithm. Evaluation tests were carried out using the ATR continuous speech database of 10 speakers. The results showed about 97% of phrase boundaries were correctly detected when we took 30-best candidates, and this accuracy is 7.5% higher than the conventional method without using N-best search algorithm.
Bibliographic reference. Nakai, Mitsuru / Shimodaira, Hiroshi (1994): "Accent phrase segmentation by finding n-best sequences of pitch pattern templates", In ICSLP-1994, 347-350.