We address the problem of speaker age estimation using i-vectors. We first compare different i-vector extraction setups and then focus on (shallow) artificial neural net (ANN) back-ends. We explore ANN architecture, training algorithm and ANN ensembles. The results on NIST 2008 and 2010 SRE data indicate that, after extensive parameter optimization, ANN back-end in combination with i-vectors reaches mean absolute errors (MAEs) of 5.49 (females) and 6.35 (males), which are 4.5% relative improvement in comparison to our support-vector regression (SVR) baseline. Hence, the choice of back-end did not affect the accuracy much; a suggested future direction is therefore focusing more on front-end processing.
Cite as: Fedorova, A., Glembek, O., Kinnunen, T., Matějka, P. (2015) Exploring ANN back-ends for i-vector based speaker age estimation. Proc. Interspeech 2015, 3036-3040, doi: 10.21437/Interspeech.2015-103
@inproceedings{fedorova15_interspeech, author={Anna Fedorova and Ondřej Glembek and Tomi Kinnunen and Pavel Matějka}, title={{Exploring ANN back-ends for i-vector based speaker age estimation}}, year=2015, booktitle={Proc. Interspeech 2015}, pages={3036--3040}, doi={10.21437/Interspeech.2015-103} }