7th International Conference on Spoken Language Processing
September 16-20, 2002
This paper proposes a channel equalization algorithm for a large speech database with application in concatenative TTS systems. The convolutional channel distortion is equalized by comparing the power spectral densities (PSDs) of utterances of different recording sessions. Autoregressive linear filters are designed on a corpus level and are used offline to filter the corresponding sentences to compensate for the relative distortions caused by the channel effects. Two experiments are carried out to evaluate the benefit of the channel equalization approach. First, this method is used to reduce the distance of their PSDs between two recording sessions to verify the effectiveness of the method. Secondly, it is applied practically in the TTS system. The whole TTS speech database is processed to reduce the PSDs variance over all sessions. Moreover, a subjective listening test is carried out to obtain human evaluation of the new TTS system. Almost all listeners prefer the synthetic speech generated by the new TTS system. Furthermore, an analysis of variance (ANOVA) on this subjective listening test demonstrates that the channel equalization process has significant effect on increasing the perceived voice-quality consistency of the TTS system.
Bibliographic reference. Shi, Yu / Chang, Eric / Peng, Hu / Chu, Min (2002): "Power spectral density based channel equalization of large speech database for concatenative TTS system", In ICSLP-2002, 2369-2372.