ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Pulse density representation of spectrum for statistical speech processing

Yoshinori Shiga

This study investigates a new spectral representation that is suitable for statistical parametric speech synthesis. Statistical speech processing involves spectral averaging in the training process; however, averaging spectra in the domain of conventional speech parameters over-smooths the resulting means, which degrades the quality of the speech synthesised. In the proposed representation, high-energy parts of the spectrum, such as sections of dominant formants, are represented by a group of high-density pulses in the frequency domain. These pulsesÂ’ locations (i.e., frequencies) are then parameterised. The representation is theoretically capable of averaging spectra with less over-smoothing effect. The experimental results provide the optimal values of factors necessary for the encoding and decoding of the proposed representation towards the future applications of speech synthesis.

doi: 10.21437/Interspeech.2009-146

Cite as: Shiga, Y. (2009) Pulse density representation of spectrum for statistical speech processing. Proc. Interspeech 2009, 1771-1774, doi: 10.21437/Interspeech.2009-146

  author={Yoshinori Shiga},
  title={{Pulse density representation of spectrum for statistical speech processing}},
  booktitle={Proc. Interspeech 2009},