12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Correlation Analysis of Acoustic Features with Perceptual Voice Quality Similarity for Similar Speaker Selection

Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno

NTT Corporation, Japan

This paper describes the correlations between various acoustic features and perceptual voice quality similarity. We focus on identifying the acoustic features that are correlated with voice quality similarity. First, a large-scale perceptual experiment using the voices of 62 speakers is conducted and perceptual similarity scores between each pair of speakers are acquired. Next, multiple linear regression analysis is carried out; it shows that five acoustic features exhibit high correlation to voice quality similarity. Last, we perform similar speaker selection based on multiple linear regression with the above features and moreover, assess its performance by classifying speakers based on the perceptual similarity. The results indicate that the combination of the five acoustic features in classifying speakers into two classes is effective in choosing speakers with similar voice quality; it reduces the error rate by about 44% compared to using just the cepstrum.

Full Paper

Bibliographic reference.  Ijima, Yusuke / Isogai, Mitsuaki / Mizuno, Hideyuki (2011): "Correlation analysis of acoustic features with perceptual voice quality similarity for similar speaker selection", In INTERSPEECH-2011, 2237-2240.