Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Performance Comparison among HMM, DTW, and Human Abilities in Terms of Identifying Stress Patterns of Word Utterances

Nobuaki Minematsu (1), Yukiko Fujisawa (2), Seiichi Nakagawa (2)

(1) Graduate School of Engineering, University of Tokyo, Japan
(2) Department of Information and Computer Sciences, Toyohashi University of Technology, Japan

We have been focusing on applying speech technologies to pronunciation learning. In our previous study [1], a stressed syllable detector was implemented by using stressed syllable HMMs and unstressed ones. And using the detector internally, several systems were implemented [2]. However, their development did not necessarily require the use of HMMs as an acoustic modeling method. In this paper, an HMM-based method, a DTW-based method, and a human strategy only with visual inspection were compared in terms of their performance in judging whether two utterances of a word have the same stress pattern, e.g. r´ecord and rec´ord. Here, one utterance was given by a Japanese learner and the other one was done by a native speaker. Experiments showed that HMMs gave us the higher performance than DTW and even human strategies. This result strongly supports the use of HMMs as an acoustic modeling method in the stressed syllable detector development.


  1. N. Minematsu et al., "Automatic detection of accent in English words spoken by Japanese students," Proc. EUROSPEECH’97, pp.701-704 (1997).
  2. N. Minematsu et al., "Prosodic evaluation of English words spoken by Japanese based upon estimating their pronunciation habits," Proc. ICSP, pp.439-444 (1999)

