This paper describes a speaker feature restoration method for improving text-independent speaker recognition with short utterances. The method employs a denoising autoencoder (DAE) to compensate speaker features of a short utterance which contains limited phonetic information. It first estimates phonetic distribution in the utterance as posteriors based on speech models and then transforms an i-vector of the utterance using DAE along with the phonetic posteriors. The DAE-based transformation is able to produce a reliable speaker feature with help of supervised training using pairs of long and short speech segments. Speaker recognition experiments on an NIST SRE task demonstrate a 37.9% error reduction.
Bibliographic reference. Yamamoto, Hitoshi / Koshinaka, Takafumi (2015): "Denoising autoencoder-based speaker feature restoration for utterances of short duration", In INTERSPEECH-2015, 1052-1056.