Voice attractiveness is an indicator which is somehow objective and
somehow subjective. It would be helpful to assume that each voice has
its own attractiveness. However, the paired comparison results of human
listeners sometimes include inconsistency. In this paper, we propose
a multidimensional mapping scheme of voice attractiveness, which explains
the existence of objective merit values of voices and subjective preference
of listeners. Paired comparison is modeled in a probabilistic framework,
and the optimal mapping is obtained from the paired comparison results
on the maximum likelihood criterion.
The merit values can
be estimated from the acoustic feature using the machine learning framework.
We show how the estimation process works using real database consisting
of common Japanese greeting utterances. Experiments using 1- and 2-
dimensional merit spaces confirm that the comparison result prediction
from the acoustic feature becomes more accurate in the 2-dimensional
case.
Cite as: Obuchi, Y. (2017) Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space. Proc. Interspeech 2017, 2223-2227, doi: 10.21437/Interspeech.2017-130
@inproceedings{obuchi17_interspeech, author={Yasunari Obuchi}, title={{Personalized Quantification of Voice Attractiveness in Multidimensional Merit Space}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2223--2227}, doi={10.21437/Interspeech.2017-130} }