This paper presents a new approach to feature-level phone normalisation which aims to improve speaker modelling in the case of short-duration training data. The new approach is referred to as phone adaptive training (PAT). Based on constrained maximum likelihood linear regression (cMLLR) and previous work in speaker adaptive training (SAT), PAT learns a set of transforms which project features into a new phone-normalised but speaker-discriminative space. Originally investigated in the context of speaker diarization, this paper presents new work to assess and optimise PAT at the level of speaker modelling and in the context of automatic speaker verification (ASV). Experiments show that PAT improves the performance of a state-of-the-art iVector ASV system by 50% relative to the baseline.
Cite as: Soldi, G., Bozonnet, S., Alegre, F., Beaugeant, C., Evans, N. (2014) Short-Duration Speaker Modelling with Phone Adaptive Training. Proc. The Speaker and Language Recognition Workshop (Odyssey 2014), 208-215, doi: 10.21437/Odyssey.2014-32
@inproceedings{soldi14_odyssey, author={Giovanni Soldi and Simon Bozonnet and Federico Alegre and Christophe Beaugeant and Nicholas Evans}, title={{Short-Duration Speaker Modelling with Phone Adaptive Training}}, year=2014, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2014)}, pages={208--215}, doi={10.21437/Odyssey.2014-32} }