11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Applying Scalable Phonetic Context Similarity in Unit Selection of Concatenative Text-to-Speech

Wei Zhang, Xiaodong Cui

IBM T.J. Watson Research Center, USA

This paper presents an approach using phonetic context similarity as a cost function in unit selection of concatenative Text-to- Speech. The approach measures the degree of similarity between the desired context and the candidate segment under different phonetic contexts. It considers the impact from relatively far contexts when plenty of candidates are available and can take advantage of the data from other symbolically different contexts when the candidates are sparse. Moreover, the cost function also provides an efficient way to prune the search space. Different parameters for modeling, normalization and integerization are discussed. MOS evaluation shows that it can improve the synthesis quality significantly.

Full Paper

Bibliographic reference.  Zhang, Wei / Cui, Xiaodong (2010): "Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech", In INTERSPEECH-2010, 154-157.