This paper presents the effects of variations in telephone line conditions on speech signals and its influence on speaker-independent recognition performance. Measurements made on a large data base collected over the telephone network show, for a given call, an important constant component perturbing the signal. This component varies greatly according to the call, reducing the discrimination between different vocabulary words. This disturbance is mainly caused by the convolved line transfer function that seems more harmful than the additive ambient noise in the databases. Cepstral subtraction is investigated to reduce the convolved disturbance. The long-term cepstrum for a given call is computed and then subtracted from the cepstra of all the utterances to be recognized We propose to subtract the first coefficients of the long-term cepstrum which smooths the corresponding long-term logarithm spectrum. This gives satisfactory results and the system obtained is more robust than the basic recognizer (20% reduction of the error rate). Another approach to normalize speech data and reduce the line effects is to use a neural network. A multilayer perception is trained to bring acoustical vectors nearer to the corresponding basic HMM gaussian mean vectors, the longterm cepstrum being presented as an input Preliminary experiments give encouraging results.
Keywords: Telephone line effects, HMM, Cepstral Subtraction, Neural Networks.
Bibliographic reference. Mokbel, C. / Monné, J. / Jouvet, D. (1993): "On-line adaptation of a speech recognizer to variations in telephone line conditions", In EUROSPEECH'93, 1247-1250.