Sixth International Conference on Spoken Language Processing
In this paper we apply the Weighted Acoustic Modeling (WAM) technique to the recognition of speech coded by the full-rate GSM codec or the FS-1016 CELP codec employing various estimates of instantaneous distortion. In the WAM method, separate hidden Markov models are developed for regions of speech that exhibit low levels of codec-induced distortion and for regions with higher levels of such distortion. At recognition time, the contributions of these models are mixed together with a weighting that is determined by estimating the instantaneous distortion. In this paper instantaneous distortion was estimated from the instantaneous cepstral distortion, the long-term gain parameter of the codec, the long-term predictability of the reconstructed signal, and measurements of recoding sensitivity. We observe that the use of the long-term gain parameter produces results that are similar to those obtained by use of cepstral distortion (which can only be obtained if the original cepstra are transmitted along with the speech signal) for the GSM codec. Overall, the effect of the degradation in error rate introduced by coding can be reduced by up to 55% with these techniques for GSM coding, and by up to 38% for the CELP coding.
Bibliographic reference. Huerta, Juan M. / Stern, Richard M. (2000): "Instantaneous-distortion based weighted acoustic modeling for robust recognition of coded speech", In ICSLP-2000, vol.3, 842-845.