7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Optimal Selection of Speech Data for Automatic Speech Recognition Systems

Arkadiusz Nagórski (1), Lou Boves (2), Herman Steeneken (1)

(1) TNO-Human Factors, The Netherlands; (2) University of Nijmegen, The Netherlands

This paper presents a method designed to select a limited set of maximally information rich speech data from a database for optimal training and diagnostic testing of Automatic Speech Recognition (ASR) systems. The method uses Principal Component Analysis (PCA) to map the variance of the speech material in a database into a low-dimensional space, followed by clustering and a selection technique. It appears that a very straightforward implementation of this procedure automatically detects at least two criteria for a classifi- cation of speakers of standard Dutch, viz. gender and the way in which the /r/ is produced. To verify the power of the technique to improve ASR, data sets of equal size selected with this method and obtained randomly were used to train a recognition system on Dutch connected digits. The results show an improvement in the recognition performance when optimal data sets were used, especially for the conditions where the sub-corpora used for training were relatively small.


Full Paper

Bibliographic reference.  Nagórski, Arkadiusz / Boves, Lou / Steeneken, Herman (2002): "Optimal selection of speech data for automatic speech recognition systems", In ICSLP-2002, 2473-2476.