Large multi paragraph speech databases encapsulate prosodic and contextual information beyond the sentence level which could be exploited to build natural sounding voices. This paper discusses our efforts on automatic building of synthetic voices from large multi-paragraph speech databases. We show that the primary issue of segmentation of large speech file could be addressed with modifications to forced-alignment technique and that the proposed technique is independent of the duration of the audio file. We also discuss how this framework could be extended to build a large number of voices from public domain large multi-paragraph recordings.
Bibliographic reference. Prahallad, Kishore / Toth, Arthur R. / Black, Alan W. (2007): "Automatic building of synthetic voices from large multi-paragraph speech databases", In INTERSPEECH-2007, 2901-2904.