15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

A Mel-Cepstral Analysis Technique Restoring High Frequency Components from Low-Sampling-Rate Speech

Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

In statistical speech synthesis, the quality of the synthesized speech depends on the quality of training data. As the sampling rate of speech is one of the effective factors, speech data has been recently recorded at a high sampling rate. However, the sampling rates of speech data recorded in the past or collected from the Internet were often low. Therefore, to use these speech data effectively for model training, we propose a mel-cepstral analysis technique that restores missing high frequency components from low-sampling-rate speech with a statistical approach. In this technique, high-sampling-rate speech waveforms are modeled directly by integrating feature extraction and modeling processes. This framework makes it possible to optimize whole processes on the basis of an integrated objective function. Then, mel-cepstral coefficients are estimated from the low-sampling-rate speech by using the model as a prior distribution. Experimental results show that the proposed method improved the quality of synthesized speech.

Full Paper

Bibliographic reference.  Nakamura, Kazuhiro / Hashimoto, Kei / Oura, Keiichiro / Nankaku, Yoshihiko / Tokuda, Keiichi (2014): "A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech", In INTERSPEECH-2014, 2494-2498.