5th International Conference on Spoken Language Processing
This paper examines a method for formant parameter extraction from a labeled single speaker database for use in a formant-parameter diphone-concatenation speech synthesis system. This procedure commences with an initial formant analysis of the labelled database, which is then used to obtain formant (F1-F5) probability spaces for each phoneme. These probability spaces guide a more careful speaker-specific extraction of formant frequencies. An analysis-by-synthesis procedure is then used to provide best-matching formant intensity and bandwidth parameters. The great majority of the parameters so extracted produce speech which is highly intelligible and which has a voice quality close to the original speaker.
Full Paper Sound Example #1 Sound Example #2
Bibliographic reference. Mannell, Robert H. (1998): "Formant diphone parameter extraction utilising a labelled single-speaker database", In ICSLP-1998, paper 0627.