In hidden Markov model-based speech synthesis, speech is typically parameterized using source-filter decomposition. A widely used analysis/synthesis framework, STRAIGHT, decomposes the speech waveform into a framewise spectral envelope and a mixed mode excitation signal. Inclusion of an aperiodicity measure in the model enables synthesis also for signals that are not purely voiced or unvoiced. In the traditional approach employing hidden Markov modeling and decision tree-based clustering, the connection between speech spectrum and aperiodicities is not taken into account. In this paper, we take advantage of this dependency and predict voice aperiodicities afterwards based on synthetic spectral representations. The evaluations carried out for English data confirm that the proposed approach is able to provide prediction accuracy that is comparable to the traditional approach.
Bibliographic reference. Silén, Hanna / Helander, Elina / Gabbouj, Moncef (2011): "Prediction of voice aperiodicity based on spectral representations in HMM speech synthesis", In INTERSPEECH-2011, 105-108.