11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Analysis of Gender Normalization Using MLP and VTLN Features

Thomas Schaaf (1), Florian Metze (2)

(1) MultiModal, USA
(2) Carnegie Mellon University, USA

This paper analyzes the capability of multilayer perceptron frontends to perform speaker normalization. We find the context decision tree to be a very useful tool to assess the speaker normalization power of different frontends. We introduce a gender question into the training of the phonetic context decision tree. After the context clustering the gender specific models are counted. We compare this for the following frontends: (1) Bottle-Neck (BN) with and without vocal tract length normalization (VTLN), (2) standard MFCC, (3) stacking of multiple MFCC frames with linear discriminant analysis (LDA). We find the BN-frontend to be even more effective in reducing the number of gender questions than VTLN. From this we conclude that a Bottle-Neck frontend is more effective for gender normalization. Combining VTLN and BN-features reduces the number of gender specific models further.

Full Paper

Bibliographic reference.  Schaaf, Thomas / Metze, Florian (2010): "Analysis of gender normalization using MLP and VTLN features", In INTERSPEECH-2010, 306-309.