ESCA Tutorial and Research Workshop on
The design of a semi-automatic labeling system is necessary for labeling large speech databases, either for acoustic-phonetic studies or automatic phonetic decoding assessment. Such a system accepts as input the standard phonetic transcription of a sentence and the corresponding speech signal uttered by a speaker. Its output consists of a segemented and labeled sentence that the user may eventually correct.
We propose an algorithm for semi-automatic labeling which yields the segmentation (begin-end- center) into phonetic units. This algorithm operates in three successive steps:
This interactive tool was tested on multispeaker corpora. Comparison between manual and semiautomatic labeling will be provided both in terms of accuracy and time. The assessment of phonetic decoding necessitates large transcribed databases made up either of semi-automatically labeled corpora or of corpora with only standard phonetic transcription. The assessment method we propose is based on two algorithms, i.e. the previous semi-automatic labeling technique and the usual dynamic time-warping between the phonetic transcription of a sentence and the speech signal. The dynamic time warping algorithm gives the "best" warping path between both sequences of unit according to a given criterion but not necessarily the good one. In order to avoid these errors made by the alignment process which are computed as errors of the acoustic phonetic decoder, we perform an adaptation of the algorithm to the system in two steps:
The comparison of this methodology according to both types of databases is carried out in terms of complexity of algorithms and multilingual adaptability.
Bibliographic reference. Bourjot, C. / Boyer, A. / Fohr, Dominique / Haton, Jean-Paul (1989): "Tools for phonetic labeling and phonetic assessment", In SIOA-1989, Vol.2, 157 (abstract).