TUSK: A Framework for Overviewing the Performance of F0 Estimators

Masanori Morise, Hideki Kawahara


This article presents a framework for overviewing the performance of fundamental frequency (F0) estimators and evaluates its effectiveness. Over the past few decades, many F0 estimators and evaluation indices have been proposed and have been evaluated using various speech databases. In speech analysis/ synthesis research, modern estimators are used as the algorithm to fulfill the demand for high-quality speech synthesis, but at the same time, they are competing with one another on minor issues. Specifically, while all of them meet the demands for high-quality speech synthesis, the result depends on the speech database used in the evaluation. Since there are various types of speech, it is inadvisable to discuss the effectiveness of each estimator on the basis of minor differences. It would be better to select the appropriate F0 estimator in accordance with the speech characteristics. The framework we propose, TUSK, does not rank the estimators but rather attempts to overview them. In TUSK, six parameters are introduced to observe the trends in the characteristics in each F0 estimator. The signal is artificially generated so that six parameters can be controllable independently. In this article, we introduce the concept of TUSK and determine its effectiveness using several modern F0 estimators.


DOI: 10.21437/Interspeech.2016-140

Cite as

Morise, M., Kawahara, H. (2016) TUSK: A Framework for Overviewing the Performance of F0 Estimators. Proc. Interspeech 2016, 1790-1794.

Bibtex
@inproceedings{Morise+2016,
author={Masanori Morise and Hideki Kawahara},
title={TUSK: A Framework for Overviewing the Performance of F0 Estimators},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-140},
url={http://dx.doi.org/10.21437/Interspeech.2016-140},
pages={1790--1794}
}