7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Speech Recognition Using Combined Acoustic and Articulatory Information With Retraining of Acoustic Model Parameters

Ka-Yee Leung, Manhung Siu

Hong Kong University of Science and Technology, China

Articulatory features (AF) are recently proposed as an alternative representation of the acoustic features (ACF) and combining an AF model and an ACF model has been shown to outperform the ACF model. In this paper, we investigated multiple ways to further improve the combination of an AF model and an ACF model. First, we propose a multiple-distribution AF model that increases modelís resolution by separately modeling different sub-phone segments. We then introduce the asynchrony combination of this multiple-distribution AF model with an ACF model to allow flexible combination of AF model "states" with different ACF model states. Second, we incorporate AF information into the ACF model training such that the ACF model is optimized to give the best performance when combining with the AF model for decoding. The combination of both techniques results in an absolute improvement of 2.5% in TIMIT phone recognition over the corresponding ACF model baseline.

Full Paper

Bibliographic reference.  Leung, Ka-Yee / Siu, Manhung (2002): "Speech recognition using combined acoustic and articulatory information with retraining of acoustic model parameters", In ICSLP-2002, 2117-2120.