ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Speech synthesis without a phone inventory

Matthew P. Aylett, Simon King, Junichi Yamagishi

In speech synthesis the unit inventory is decided using phonological and phonetic expertise. This process is resource intensive and potentially sub-optimal. In this paper we investigate how acoustic clustering, together with lexicon constraints, can be used to build a self-organised inventory. Six English speech synthesis systems were built using two frameworks, unit selection and parametric HTS for three inventory conditions: 1) a traditional phone set, 2) a system using orthographic units, and 3) a self-organised inventory. A listening test showed a strong preference for the classic system, and for the orthographic system over the self-organised system. Results also varied by letter to sound complexity and database coverage. This suggests the self-organised approach failed to generalise pronunciation as well as introducing noise above and beyond that caused by orthographic sound mismatch.


doi: 10.21437/Interspeech.2009-598

Cite as: Aylett, M.P., King, S., Yamagishi, J. (2009) Speech synthesis without a phone inventory. Proc. Interspeech 2009, 2087-2090, doi: 10.21437/Interspeech.2009-598

@inproceedings{aylett09_interspeech,
  author={Matthew P. Aylett and Simon King and Junichi Yamagishi},
  title={{Speech synthesis without a phone inventory}},
  year=2009,
  booktitle={Proc. Interspeech 2009},
  pages={2087--2090},
  doi={10.21437/Interspeech.2009-598}
}