16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Denoising Autoencoder-Based Speaker Feature Restoration for Utterances of Short Duration

Hitoshi Yamamoto, Takafumi Koshinaka

NEC Corporation, Japan

This paper describes a speaker feature restoration method for improving text-independent speaker recognition with short utterances. The method employs a denoising autoencoder (DAE) to compensate speaker features of a short utterance which contains limited phonetic information. It first estimates phonetic distribution in the utterance as posteriors based on speech models and then transforms an i-vector of the utterance using DAE along with the phonetic posteriors. The DAE-based transformation is able to produce a reliable speaker feature with help of supervised training using pairs of long and short speech segments. Speaker recognition experiments on an NIST SRE task demonstrate a 37.9% error reduction.

Full Paper

Bibliographic reference.  Yamamoto, Hitoshi / Koshinaka, Takafumi (2015): "Denoising autoencoder-based speaker feature restoration for utterances of short duration", In INTERSPEECH-2015, 1052-1056.