7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Acoustic Measures vs. Phonetic Features as Predictors of Audible Discontinuity in Concatenative Speech Synthesis

Hisashi Kawai, Minoru Tsuzaki

ATR Spoken Language Translation Research Laboratories, Japan

Most concatenative speech synthesizers employ both acoustic measures and phonetic features to predict the perceptual damage caused by concatenating two waveform segments because no reliable acoustic measure has been found so far. This paper compares the predicting ability of the two kinds of predictor variables. We first conduct a perceptual experiment to measure the naturalness degradation due to signal discontinuity introduced by concatenating waveform segments. Secondly, we predict the score of naturalness degradation from acoustic measures derived from MFCC and/or phonetic features using statistical models such as a multiple regression model. Based on an investigation of the multiple regression coeffi- cients, we found that (1) the phonetic features are more effective and that (2) the acoustic measures do not provide useful information in addition to the phonetic features.


Full Paper

Bibliographic reference.  Kawai, Hisashi / Tsuzaki, Minoru (2002): "Acoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis", In ICSLP-2002, 2621-2624.