When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neuron disease (MND) or Parkinson's, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. In this approach, the patient's recordings are used to adapt an average voice model pre-trained on many speakers. The structure of the voice model allows some reconstruction of the voice by substituting some components from the average voice in order to compensate for the disorders found in the patient's speech. In this paper, we compare different substitution strategies and introduce a context-dependent model substitution to improve the intelligibility of the synthetic speech while retaining the vocal identity of the patient. A subjective evaluation of the reconstructed voice for a patient with MND shows promising results for this strategy.
Index Terms: HTS, Voice Cloning, Voice Reconstruction, Assistive Technologies
Bibliographic reference. Veaux, Christophe / Yamagishi, Junichi / King, Simon (2012): "Using HMM-based speech synthesis to reconstruct the voice of individuals with degenerative speech disorders", In INTERSPEECH-2012, 967-970.