5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

A Very Low Bit Rate Speech Coder Using HMM With Speaker Adaptation

Takashi Masuko (1), Keiichi Tokuda (2), Takao Kobayashi (1634754890)

(1) Precision and Intelligence Laboratory, Tokyo Inst. of Tech., Japan
(2) Department of Computer Science, Nagoya Inst. of Tech., Japan
(539766376) Interdisciplinary Graduate School of Science and Engineering, Tokyo Inst. of Tech., Japan

This paper describes a speaker adaptation technique for a phonetic vocoder based on HMM. In the vocoder, the encoder performs phoneme recognition and transmits phoneme indexes and state durations to the decoder, and the decoder synthesizes speech using HMM-based speech synthesis technique. One of the main problems of this vocoder is that the voice characteristics of synthetic speech depend on HMMs used in the decoder, and are therefore fixed regardless of a variety of input speakers. To overcome this problem, we adapt HMMs to input speech by transmitting transfer vectors, information on mismatch between the input speech and HMMs. The results of the subjective tests show that the performance of the proposed vocoder without quantization of transfer vectors is comparable to that of a speaker dependent vocoder.

Full Paper
Sound Examples
#1
- synthesized from original spectral parameters
#2 - coded speech using speaker dependent models
#3 - coded speech using speaker independent models without adaptation
#4 - coded speech using adapted models without quantization of transfer vectors
#5 - coded speech using adapted models with quantization of transfer vectors

Bibliographic reference.  Masuko, Takashi / Tokuda, Keiichi / Kobayashi, Takao (1998): "A very low bit rate speech coder using HMM with speaker adaptation", In ICSLP-1998, paper 0777.