 |
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 16-20, 2000 |
 |
Structured Redefinition of Sound Units by Merging and Splitting for Improved Speech Recognition
Rita Singh (1), Bhiksha Raj (2), Richard M. Stern (1)
(1) Department of Electrical and Computer Engineering and School of Computer Science,
Carnegie Mellon University,
Pittsburgh, PA, USA
(2) Compaq Computer Corporation, Cambridge, MA, USA
The performance of speech recognition systems degrades
when the basic sound units used are poorly defined or inconsistently
used. Several attempts have been made to improve
dictionaries automatically, either by redefining pronunciations
of words in terms of existing sound units, or by redefining
the sound units themselves completely. The problem with
these approaches is that, while the former is limited by the
sound units used, the latter discards all human information
that has been incorporated into an expert-designed recognition
dictionary. In this paper we propose a new merging-andsplitting
algorithm that attempts to redefine the basic sound
units used in the dictionary, while maintaining the expert
knowledge built into a manually designed dictionary. Sound
units from an existing dictionary are merged based on their
inherent confusability, as measured by a Monte-Carlo based
metric, and subsequently split to maximize the likelihood of
the training data. Experiments with the Resource Management
database indicate that this approach results in an
improvement in recognition accuracy when context-independent
models are used for recognition. When context-dependent
models are used, the improvement observed is reduced.
Full Paper
Bibliographic reference.
Singh, Rita / Raj, Bhiksha / Stern, Richard M. (2000):
"Structured redefinition of sound units by merging and splitting for improved speech recognition",
In ICSLP-2000, vol.3, 151-154.