We address the problem of speaker age estimation using i-vectors. We first compare different i-vector extraction setups and then focus on (shallow) artificial neural net (ANN) back-ends. We explore ANN architecture, training algorithm and ANN ensembles. The results on NIST 2008 and 2010 SRE data indicate that, after extensive parameter optimization, ANN back-end in combination with i-vectors reaches mean absolute errors (MAEs) of 5.49 (females) and 6.35 (males), which are 4.5% relative improvement in comparison to our support-vector regression (SVR) baseline. Hence, the choice of back-end did not affect the accuracy much; a suggested future direction is therefore focusing more on front-end processing.
Bibliographic reference. Fedorova, Anna / Glembek, Ondřej / Kinnunen, Tomi / Matějka, Pavel (2015): "Exploring ANN back-ends for i-vector based speaker age estimation", In INTERSPEECH-2015, 3036-3040.