This paper presents an improved method of training for the unvoiced filter that comprises an excitation model, within the framework of parametric speech synthesis based on hidden Markov models. The conventional approach calculates the unvoiced filter response from the differential signal of the residual and voiced excitation estimate. The differential signal, however, includes the error generated by the voiced excitation estimates. Contaminated by the error, the unvoiced filter tends to be overestimated, which causes the synthetic speech to be noisy. In order for unvoiced filter training to obtain targets that are free from the contamination, the improved approach first separates the non-periodic component of residual signal from the periodic component. The unvoiced filter is then trained from the non-periodic component signals. Experimental results show that unvoiced filter responses trained with the new approach are clearly noiseless, in contrast to the responses trained with the conventional approach.
Bibliographic reference. Shiga, Yoshinori / Toda, Tomoki / Sakai, Shinsuke / Kawai, Hisashi (2010): "Improved training of excitation for HMM-based parametric speech synthesis", In INTERSPEECH-2010, 809-812.