This paper presents an experimental comparison of a broad range of the leading vocoder types which have been previously described. We use a reference implementation of each of these to create stimuli for a listening test using copy synthesis. The listening test is performed using both Lombard and normal read speech stimuli, and with two types of question for comparison. Multi-dimensional Scaling (MDS) is conducted on the listener responses to analyse similarities in terms of quality between the vocoders. Our MDS and clustering results show that the vocoders which use a sinusoidal synthesis approach are perceptually distinguishable from the source-filter vocoders. To help further interpret the axes of the resulting MDS space, we test for correlations with standard acoustic quality metrics and find one axis is strongly correlated with PESQ scores. We also find both speech style and the format of the listening test question may influence test results. Finally, we also present preference test results which compare each vocoder with the natural speech.
Index Terms: Speech Synthesis, Vocoder, Similarity, Quality
Cite as: Hu, Q., Richmond, K., Yamagishi, J., Latorre, J. (2013) An experimental comparison of multiple vocoder types. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 135-140
@inproceedings{hu13_ssw, author={Qiong Hu and Korin Richmond and Junichi Yamagishi and Javier Latorre}, title={{An experimental comparison of multiple vocoder types}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={135--140} }