MLP based front-ends have evolved in different ways in recent years beyond the seminal TANDEM-PLP features. This paper aims at providing a fair comparison of these recent progresses including the use of different long/short temporal inputs (PLP,MRASTA,wLP-TRAPS,DCT-TRAPS) and the use of complex architectures (bottleneck, hierarchy, multistream) that go beyond the conventional three layer MLP. Furthermore, the paper identifies which of these actually provide advantages over the conventional TANDEM-PLP. The investigation is carried on an LVCSR task for recognition of Mandarin Broadcast speech and results are analyzed in terms of Character Error Rate and phonetic confusions. Results reveal that as stand alone features, multistream front-ends can outperform by 10% conventional MFCC while TANDEM-PLP only improve by 1%. On the other hand, when used in concatenation with MFCC features, hierarchical/bottleneck front-ends reduce the character error rate by +18% relative compared to +14% relative from TANDEM-PLP. The various input long-term representations recently developed provide comparable performances.
Bibliographic reference. Valente, Fabio / Magimai-Doss, Mathew / Wang, Wen (2011): "Analysis and comparison of recent MLP features for LVCSR systems", In INTERSPEECH-2011, 1245-1248.