![]() |
Eighth ISCA Workshop on Speech SynthesisBarcelona, Catalonia, Spain |
![]() |
This paper presents the beginnings of a framework for formal testing of the causes
of the current limited quality of HMM (Hidden Markov Model) speech synthesis.
This framework separates each of the effects of modelling to observe their independent
effects on vocoded speech parameters in order to address the issues that
are restricting the progression to highly intelligible and natural-sounding speech
synthesis.
The simulated HMM synthesis conditions are performed on spectral
speech parameters and tested via a pairwise listening test, asking listeners to perform
a "same or different" judgement on the quality of the synthesised speech
produced between these conditions. These responses are then processed using
multidimensional scaling to identify the qualities in modelled speech that listeners
are attending to and thus forms the basis of why they are distinguishable from
natural speech.
The future improvements to be made to the framework will finally
be discussed which include the extension to more of the parameters modelled
during speech synthesis.
Index Terms: Speech synthesis, Hidden Markov models,
Vocoding
Bibliographic reference. Merritt, Thomas / King, Simon (2013): "Investigating the shortcomings of HMM synthesis", In SSW8, 165-170.