It is now recognised that progress in Speech Technology in particular, and the Speech Sciences in general, is dependent to a considerable extent on the quality of the speech databases that are available. Their coverage of speakers, speaking style, vocabulary, recording conditions etc. is crucial both to the quality of the system under development and the nature of speech-knowledge that can be gained from them. However, all these qualities can be undermined if the reliability of their labelling is suspect. The labelling of databases does not normally stand in the forefront of attention when database projects are undertaken, but in terms of effort it can swallow more than all the rest of the undertaking. Next to reliability; therefore, economy of effort is of paramount importance.
Work at the authors' laboratories has pursued these two principles along the following lines: Reliability is maximised within the automatic procedure by the incorporation of an interactive component which allows the scrutiny and modification of boundary placement according to preselected criteria. The economy of effort which an automatic system provides is maximised, and the cost of intervention through the interactive process minimised by a segmentation and alignment process which runs in real-time. Above all, the system is geared towards economy of effort in terms of transferability of training.
A multi-language approach lies at the heart of this feature. The Kohonen-type Self-Organising Neural Network plus Viterbi search with level-building is based on an acoustic-phonetic feature specification of the sounds of a language which allows for statements of equivalence between languages (Polyphonemes). This offers the option of (a) training the polyphoneme categories over several languages simultaneously to increase the occurrence of a given category, or (b) using the projected equivalences between languages to boot-strap a labelling system for a language in which recordings but no labelled material exists.
Bibliographic reference. Barry, William / Dalsgaard, Paul (1993): "Speech database annotation. the importance of a multi-lingual approach", In EUROSPEECH'93, 13-20.