ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison

S. Revelin, D. Cadic, C. Waast-Richard

One issue arising in text-to-phone conversion is inconsistency between its output and the phonetic time-alignment of the dataset, hindering the back-end's ability to access the best units to synthesize a text. Some such inconsistency is inevitable because dataset labeling requires allowance for alternate pronunciations of words, while the front-end typically predicts a single pronunciation for a word. In this paper we describe an alternate algorithm that recovers from these inconsistencies. The front-end is modified in order to allow multiple pronunciations for a word. The selection of the best pronunciation is based on an a posteriori cost function comparison between the synthetic signals.


doi: 10.21437/Interspeech.2005-593

Cite as: Revelin, S., Cadic, D., Waast-Richard, C. (2005) Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison. Proc. Interspeech 2005, 1885-1888, doi: 10.21437/Interspeech.2005-593

@inproceedings{revelin05_interspeech,
  author={S. Revelin and D. Cadic and C. Waast-Richard},
  title={{Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={1885--1888},
  doi={10.21437/Interspeech.2005-593}
}