ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Structured redefinition of sound units by merging and splitting for improved speech recognition

Rita Singh, Bhiksha Raj, Richard M. Stern

The performance of speech recognition systems degrades when the basic sound units used are poorly defined or inconsistently used. Several attempts have been made to improve dictionaries automatically, either by redefining pronunciations of words in terms of existing sound units, or by redefining the sound units themselves completely. The problem with these approaches is that, while the former is limited by the sound units used, the latter discards all human information that has been incorporated into an expert-designed recognition dictionary. In this paper we propose a new merging-andsplitting algorithm that attempts to redefine the basic sound units used in the dictionary, while maintaining the expert knowledge built into a manually designed dictionary. Sound units from an existing dictionary are merged based on their inherent confusability, as measured by a Monte-Carlo based metric, and subsequently split to maximize the likelihood of the training data. Experiments with the Resource Management database indicate that this approach results in an improvement in recognition accuracy when context-independent models are used for recognition. When context-dependent models are used, the improvement observed is reduced.


Cite as: Singh, R., Raj, B., Stern, R.M. (2000) Structured redefinition of sound units by merging and splitting for improved speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 151-154

@inproceedings{singh00_icslp,
  author={Rita Singh and Bhiksha Raj and Richard M. Stern},
  title={{Structured redefinition of sound units by merging and splitting for improved speech recognition}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 151-154}
}