Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Optimization of Units for Continuous-Digit Recognition Task

Sachin S. Kajarakar (1), Hynek Hermansky (1,2)

(1) Oregon Graduate Institute of Science and Technology, Portland, OR, USA
(2) International Computer Science Institute, Berkeley, CA, USA

The choice of units, sub-word or whole-word, is generally based on the size of the vocabulary and the amount of training data. In this work, we have introduced new constraints on the units: 1) they should contain sufficient statistics of the features and 2) they should contain sufficient statistics of the vocabulary. This led to minimization of two cost functions, first based on the confusion between the features and the units and the second based on the confusion between the units and the words. We minimized first cost function by forming broad phone classes that were less confusing among themselves than the phones. The second cost function was minimized by coding the word-specific phone sequences. On the continuous digit recognition task, the broad classes performed worse than the phones. The word-specific phone sequences however significantly improved the performance over both the phones and the whole-word units. In this paper we discuss the new constraints, our specific implementation of the cost functions, and the corresponding recognition performance.

Full Paper

Bibliographic reference.  Kajarakar, Sachin S. / Hermansky, Hynek (2000): "Optimization of units for continuous-digit recognition task", In ICSLP-2000, vol.2, 539-542.