ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Denoising autoencoder-based speaker feature restoration for utterances of short duration

Hitoshi Yamamoto, Takafumi Koshinaka

This paper describes a speaker feature restoration method for improving text-independent speaker recognition with short utterances. The method employs a denoising autoencoder (DAE) to compensate speaker features of a short utterance which contains limited phonetic information. It first estimates phonetic distribution in the utterance as posteriors based on speech models and then transforms an i-vector of the utterance using DAE along with the phonetic posteriors. The DAE-based transformation is able to produce a reliable speaker feature with help of supervised training using pairs of long and short speech segments. Speaker recognition experiments on an NIST SRE task demonstrate a 37.9% error reduction.


doi: 10.21437/Interspeech.2015-283

Cite as: Yamamoto, H., Koshinaka, T. (2015) Denoising autoencoder-based speaker feature restoration for utterances of short duration. Proc. Interspeech 2015, 1052-1056, doi: 10.21437/Interspeech.2015-283

@inproceedings{yamamoto15_interspeech,
  author={Hitoshi Yamamoto and Takafumi Koshinaka},
  title={{Denoising autoencoder-based speaker feature restoration for utterances of short duration}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={1052--1056},
  doi={10.21437/Interspeech.2015-283}
}