In building personalized synthetic voices for people with speech disorders, the output should capture the individualís vocal identity. This paper reports a listener judgment experiment on the similarity of Hidden Markov Model based synthetic voices using varying amounts of adaptation data to two non-impaired speakers. We conclude that around 100 sentences of data is needed to build a voice that retains the characteristics of the target speaker but using more data improves the voice. Experiments using Multi-Layer Perceptrons (MLPs) are conducted to find which acoustic features contribute to the similarity judgments. Results show that melcepstral distortion and fraction of voicing agreement contribute most to replicating the similarity judgment but the combination of all features is required for accurate prediction. Ongoing work applies the findings to voice building for people with impaired speech.
Bibliographic reference. Creer, S. M. / Cunningham, S. P. / Green, P. D. / Fatema, K. (2009): "Personalizing synthetic voices for people with progressive speech disorders: judging voice similarity", In INTERSPEECH-2009, 1427-1430.