This study investigates a new spectral representation that is suitable for statistical parametric speech synthesis. Statistical speech processing involves spectral averaging in the training process; however, averaging spectra in the domain of conventional speech parameters over-smooths the resulting means, which degrades the quality of the speech synthesised. In the proposed representation, high-energy parts of the spectrum, such as sections of dominant formants, are represented by a group of high-density pulses in the frequency domain. These pulsesí locations (i.e., frequencies) are then parameterised. The representation is theoretically capable of averaging spectra with less over-smoothing effect. The experimental results provide the optimal values of factors necessary for the encoding and decoding of the proposed representation towards the future applications of speech synthesis.
Bibliographic reference. Shiga, Yoshinori (2009): "Pulse density representation of spectrum for statistical speech processing", In INTERSPEECH-2009, 1771-1774.