EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Syllable-Based Acoustic Modeling for Japanese Spontaneous Speech Recognition

Jun Ogata (1), Yasuo Ariki (2)

(1) AIST, Japan
(2) Ryukoku University, Japan

We study on a syllable-based acoustic modeling method for Japanese spontaneous speech recognition. Traditionally, mora-based acoustic models have been adopted for Japanese read speech recognition systems. In this paper, syllable-based unit and mora-based unit are clearly distinguished in their definition, and syllables are shown to be more suitable as an acoustic model for Japanese spontaneous speech recognition. In spontaneous speech, a vowel lengthening occurs frequently, and recognition accuracy is greatly affected by this phenomena. From this viewpoint, we propose an acoustic modeling technique that explicitly incorporates the vowel lengthening in syllable-based HMMs. Experimental results showed that the proposed model could exceed the performance of conventionally used cross-word triphone model and mora-based model in Japanese spontaneous speech recognition task.

Full Paper

Bibliographic reference.  Ogata, Jun / Ariki, Yasuo (2003): "Syllable-based acoustic modeling for Japanese spontaneous speech recognition", In EUROSPEECH-2003, 2513-2516.