7th International Conference on Spoken Language Processing
September 16-20, 2002
Articulatory features (AF) are recently proposed as an alternative representation of the acoustic features (ACF) and combining an AF model and an ACF model has been shown to outperform the ACF model. In this paper, we investigated multiple ways to further improve the combination of an AF model and an ACF model. First, we propose a multiple-distribution AF model that increases modelís resolution by separately modeling different sub-phone segments. We then introduce the asynchrony combination of this multiple-distribution AF model with an ACF model to allow flexible combination of AF model "states" with different ACF model states. Second, we incorporate AF information into the ACF model training such that the ACF model is optimized to give the best performance when combining with the AF model for decoding. The combination of both techniques results in an absolute improvement of 2.5% in TIMIT phone recognition over the corresponding ACF model baseline.
Bibliographic reference. Leung, Ka-Yee / Siu, Manhung (2002): "Speech recognition using combined acoustic and articulatory information with retraining of acoustic model parameters", In ICSLP-2002, 2117-2120.