7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Evaluation of a Speech Recognition / Generation Method Based on HMM and Straight

Toshio Irino (1), Yasuhiro Minami (1), Tomohiro Nakatani (1), Minoru Tsuzaki (2), H. Tagawa (3)

(1) NTT Corporation, Japan; (2) ATR Spoken Language Translation Research Laboratories, Japan; (3) ATR Technology Liaison and Incubation Center, Japan

We propose a method for integrating speech recognition and generation within a unified framework. The method consists of STRAIGHT, warped-frequency DCT, and an HMM engine. The warped-frequency DCT is used to derive a kind of mel-cepstral coefficient from the smoothed spectrum of STRAIGHT, which is known as a high-quality vocoder. This analysis/synthesis method has potential to improve the performance beyond a conventional method using the MFCC derived from the STFT. We evaluated the method by using speakerdependent speech recognition as well as by the perceptual evaluation of sounds generated by HMM text-to-speech. The recognition rate using the coefficients from the warped-DCT of the STRAIGHT spectrum was almost the same as that obtained using conventional MFCCs. The sound quality was sufficiently good for a fundamental system.


Full Paper

Bibliographic reference.  Irino, Toshio / Minami, Yasuhiro / Nakatani, Tomohiro / Tsuzaki, Minoru / Tagawa, H. (2002): "Evaluation of a speech recognition / generation method based on HMM and straight", In ICSLP-2002, 2545-2548.