Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition

Rita Singh (1), Bhiksha Raj (2), Richard M. Stern (1)

(1) Department of Electrical and Computer Engineering and School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
(2) Compaq Computer Corporation, Cambridge, MA, USA

The performance of speech recognition systems degrades when the basic sound units used are poorly defined or inconsistently used. Several attempts have been made to improve dictionaries automatically, either by redefining pronunciations of words in terms of existing sound units, or by redefining the sound units themselves completely. The problem with these approaches is that, while the former is limited by the sound units used, the latter discards all human information that has been incorporated into an expert-designed recognition dictionary. In this paper we propose a new merging-andsplitting algorithm that attempts to redefine the basic sound units used in the dictionary, while maintaining the expert knowledge built into a manually designed dictionary. Sound units from an existing dictionary are merged based on their inherent confusability, as measured by a Monte-Carlo based metric, and subsequently split to maximize the likelihood of the training data. Experiments with the Resource Management database indicate that this approach results in an improvement in recognition accuracy when context-independent models are used for recognition. When context-dependent models are used, the improvement observed is reduced.


Full Paper

Bibliographic reference.  Singh, Rita / Raj, Bhiksha / Stern, Richard M. (2000): "Structured redefinition of sound units by merging and splitting for improved speech recognition", In ICSLP-2000, vol.3, 151-154.