EUROSPEECH 2003 - INTERSPEECH 2003
Perception of vocal styles is of paramount importance in vocal server application as the global style of a telecom service is highly dependant on the voice used. In this work we develop tools for automatic inference of perceived vocal styles for a set of 100 vocal sequences. In a first stage, twenty subjective evaluation criteria have been identified by running perceptive experiments with naive listeners. In a second stage, the vocal sequences have been parameterised using more than a hundred acoustic features representing prosody, spectral energy distribution, articulation and waveform. Then, regression analysis and neural networks are used for predicting the subjective score of each voice for each subjective criterion. The results show that the prediction error is generally low: it seems possible to predict automatically the perceived quality of the sequences. Moreover, the prediction error decreases when non-significant parameters are removed.
Bibliographic reference. Ehrette, T. / Chateau, N. / d'Alessandro, Christophe / Maffiolo, V. (2003): "Predicting the perceptive judgment of voices in a telecom context: selection of acoustic parameters", In EUROSPEECH-2003, 117-120.