Although Hidden Markov Model based speech synthesis has been proved to have good performance, there are still some factors which degrade the quality of synthesized speech: vocoder, model accuracy and over-smoothing. This paper analyzes these factors separately. Modifications for removing different factors are proposed. Experimental results show that over-smoothing in frequency domain mainly affect the quality of synthesized speech whereas over-smoothing in time domain can nearly be ignored. Time domain over-smoothing is generally caused by model structure accuracy problem and frequency domain oversmoothing is caused by training algorithm accuracy problem. Currently used model structure is capable of representing speech without quality degradation. ML-estimation based parameter training algorithm causes distortion of perception in speech synthesis. Modification for improving parameter training algorithm is more likely to improve the synthesizing performance. Index Terms— Hidden Markov Model, speech synthesis
Cite as: Zhang, M., Tao, J.-H., Jia, H.-B., Wang, X. (2008) Improving HMM-based Speech Synthesis by Reducing Over-smoothing Problems. Proc. International Symposium on Chinese Spoken Language Processing, 17-20
@inproceedings{zhang08_iscslp, author={Meng Zhang and Jian-Hua Tao and Hui-Bin Jia and Xia Wang}, title={{Improving HMM-based Speech Synthesis by Reducing Over-smoothing Problems}}, year=2008, booktitle={Proc. International Symposium on Chinese Spoken Language Processing}, pages={17--20} }