Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition

Erfan Loweimi, Jon Barker, Thomas Hain


Designing good normalisation to counter the effect of environmental distortions is one of the major challenges for automatic speech recognition (ASR). The Vector Taylor series (VTS) method is a powerful and mathematically well principled technique that can be applied to both the feature and model domains to compensate for both additive and convolutional noises. One of the limitations of this approach, however, is that it is tied to MFCC (and log-filterbank) features and does not extend to other representations such as PLP, PNCC and phase-based front-ends that use power transformation rather than log compression. This paper aims at broadening the scope of the VTS method by deriving a new formulation that assumes a power transformation is used as the non-linearity during feature extraction. It is shown that the conventional VTS, in the log domain, is a special case of the new extended framework. In addition, the new formulation introduces one more degree of freedom which makes it possible to tune the algorithm to better fit the data to the statistical requirements of the ASR back-end. Compared with MFCC and conventional VTS, the proposed approach provides up to 12.2% and 2.0% absolute performance improvements on average, in Aurora-4 tasks, respectively.


DOI: 10.21437/Interspeech.2016-1028

Cite as

Loweimi, E., Barker, J., Hain, T. (2016) Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition. Proc. Interspeech 2016, 3798-3802.

Bibtex
@inproceedings{Loweimi+2016,
author={Erfan Loweimi and Jon Barker and Thomas Hain},
title={Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1028},
url={http://dx.doi.org/10.21437/Interspeech.2016-1028},
pages={3798--3802}
}