ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

Jon R. W. Yi, James R. Glass, I. Lee Hetherington

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications.


Cite as: Yi, J.R.W., Glass, J.R., Hetherington, I.L. (2000) A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 322-325

@inproceedings{yi00_icslp,
  author={Jon R. W. Yi and James R. Glass and I. Lee Hetherington},
  title={{A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 322-325}
}