7th International Conference on Spoken Language Processing
September 16-20, 2002
We propose a method for integrating speech recognition and generation within a unified framework. The method consists of STRAIGHT, warped-frequency DCT, and an HMM engine. The warped-frequency DCT is used to derive a kind of mel-cepstral coefficient from the smoothed spectrum of STRAIGHT, which is known as a high-quality vocoder. This analysis/synthesis method has potential to improve the performance beyond a conventional method using the MFCC derived from the STFT. We evaluated the method by using speakerdependent speech recognition as well as by the perceptual evaluation of sounds generated by HMM text-to-speech. The recognition rate using the coefficients from the warped-DCT of the STRAIGHT spectrum was almost the same as that obtained using conventional MFCCs. The sound quality was sufficiently good for a fundamental system.
Bibliographic reference. Irino, Toshio / Minami, Yasuhiro / Nakatani, Tomohiro / Tsuzaki, Minoru / Tagawa, H. (2002): "Evaluation of a speech recognition / generation method based on HMM and straight", In ICSLP-2002, 2545-2548.