Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling

Ehsan Variani, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani


State-of-the-art automatic speech recognition (ASR) systems typically rely on pre-processed features. This paper studies the time-frequency duality in ASR feature extraction methods and proposes extending the standard acoustic model with a complex-valued linear projection layer to learn and optimize features that minimize standard cost functions such as cross-entropy. The proposed Complex Linear Projection (CLP) features achieve superior performance compared to pre-processed Log Mel features.


DOI: 10.21437/Interspeech.2016-1459

Cite as

Variani, E., Sainath, T.N., Shafran, I., Bacchiani, M. (2016) Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling. Proc. Interspeech 2016, 808-812.

Bibtex
@inproceedings{Variani+2016,
author={Ehsan Variani and Tara N. Sainath and Izhak Shafran and Michiel Bacchiani},
title={Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1459},
url={http://dx.doi.org/10.21437/Interspeech.2016-1459},
pages={808--812}
}