ISCA Archive SSW 2010
ISCA Archive SSW 2010

Handling large audio files in audio books for building synthetic voices

Kishore Prahallad, Alan W. Black

One of the issues in using audio books for building a synthetic voice is the segmentation of large audio files. The use of standard forced-alignment to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model. In this work, we propose suitable modifications to the standard forced-alignment algorithm and demonstrate its usefulness for segmentation of large audio files. Experimental results are provided on audio files including an artificially created large audio file and on EMMA speech corpus of 17.5 hours. Synthetic voices are also built using these large audio files.

Index Terms: Large audio file, audio books, forced-alignment, text-to-speech


Cite as: Prahallad, K., Black, A.W. (2010) Handling large audio files in audio books for building synthetic voices. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 148-153

@inproceedings{prahallad10_ssw,
  author={Kishore Prahallad and Alan W. Black},
  title={{Handling large audio files in audio books for building synthetic voices}},
  year=2010,
  booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)},
  pages={148--153}
}