Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System

Masanori Morise, Genta Miyashita, Kenji Ozawa


A speech coding for a full-band speech analysis/synthesis system is described. In this work, full-band speech is defined as speech with a sampling frequency above 40 kHz, whose Nyquist frequency covers the audible frequency range. In prior works, speech coding has generally focused on the narrow-band speech with a sampling frequency below 16 kHz. On the other hand, statistical parametric speech synthesis currently uses the full-band speech, and low-dimensional representation of speech parameters is being used. The purpose of this study is to achieve speech coding without deterioration for full-band speech. We focus on a high-quality speech analysis/synthesis system and mel-cepstral analysis using frequency warping. In the frequency warping function, we directly use three auditory scales. We carried out a subjective evaluation using the WORLD vocoder and found that the optimum number of dimensions was around 50. The kind of frequency warping did not significantly affect the sound quality in the dimensions.


 DOI: 10.21437/Interspeech.2017-67

Cite as: Morise, M., Miyashita, G., Ozawa, K. (2017) Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System. Proc. Interspeech 2017, 409-413, DOI: 10.21437/Interspeech.2017-67.


@inproceedings{Morise2017,
  author={Masanori Morise and Genta Miyashita and Kenji Ozawa},
  title={Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={409--413},
  doi={10.21437/Interspeech.2017-67},
  url={http://dx.doi.org/10.21437/Interspeech.2017-67}
}