5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

An Efficient Labeling Tool for the QuickSig Speech Database

Matti Karjalainen, Toomas Altosaar, Miikka Huttunen

Helsinki University of Technology, Finland

An automated speech signal labeling tool, developed for the QuickSig speech database environment, is described. It is based primarily on the use of neural networks as diphone event detectors. For robustness, only coarse categories of diphones, such as stop-vowel and vowel-nasal, are used. 64 such detectors are implemented to cover all of the Finnish diphones. The preprocessing of speech signals is carried out using warped linear prediction and the diphone events from neural network outputs are matched to the given text transcription using a simple rule-based parser. In the case of isolated word labeling of single speaker signals a well trained system makes about 1-2 % of coarse labeling errors and the deviation of boundary positions, compared to careful manual labeling, is on average about 10 ms. Generalization ability to label other speakers shows promising.

Full Paper

Bibliographic reference.  Karjalainen, Matti / Altosaar, Toomas / Huttunen, Miikka (1998): "An efficient labeling tool for the Quicksig speech database", In ICSLP-1998, paper 0885.