International Symposium on Chinese Spoken Language Processing (ISCSLP 2002)

Taipei, Taiwan
August 23-24, 2002

Structure-Based Compensation Using an Improved Statistical Linear Approximation for Mandarin Speech Recognition over Telephone

Zhao-Bing Han, Hua-Yun Zhang, Bo Xu

National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China

In this paper, a Vector Piecewise Polynomial (VPP) approximation algorithm is proposed for robust speech recognition in telecommunication environments. The method is formulated in a statistical framework in order to perform the optimal compensation of noise effect given the observed noisy speech, a model describing the statistics of the speech recorded in clean reference environment and the estimation of the noisy recognition environment.

The VPP algorithm is an extension of P.J.Moreno’s Vector Taylor Series (VTS) approximations for dealing with the distortion due to channel effects and background noise. We use a piecewise polynomial, namely two linear polynomials and a quadratic polynomial, to approximate the environment function (f(v)). Moreno replaced f(v) by its vector Taylor series approximation. It is well known that VTS is not precise if variables (v) are not close to the Taylor expansion points (v0). The VPP algorithm can overcome this defect. In addition, VPP estimates the parameters of the environment by the expectation-maximization (EM) algorithm.

Experimental results are presented in the paper on the application of this approach in improving the performance of Mandarin large vocabulary continuous speech recognition (LVCSR) due to different transmission channels (Such as fixed telephone line and GSM) and the background noise. The proposed VPP algorithm is found to converge fast. The method can reduce the average character error rate (CER) by about 12 %.


Full Paper

Bibliographic reference.  Han, Zhao-Bing / Zhang, Hua-Yun / Xu, Bo (2002): "Structure-based compensation using an improved statistical linear approximation for Mandarin speech recognition over telephone", In ISCSLP 2002, paper 60.