SAPA-SCALE Conference 2012
Portland, OR, USA
In this paper, we investigate the cochlear implant-like processing
of speech signal in speaker verification. This processing
was applied on each speech utterance, in the temporal domain,
to reduce spectral information in the original speech signal
and synthesize a new one, called cochlear implant-like spectrally
reduced speech (SRS), only from low-bandwidth subband
temporal envelopes of the original speech. Spectral analyses,
performed on voiced speech frames, showed that despite of
the spectral and perceptual reduction induced by the cochlear
implant-like signal processing, the global shape of the shortterm
spectral envelopes of the SRS signal is rather similar to
that of the original speech signal.
Although the SRS is synthesized only from low-bandwidth subband temporal envelopes of original speech signal, its use in a baseline GMM-UBM speaker verification system, with cellular telephone conversational speech of the Switchboard corpus (used in NIST SRE 2002), did not alter substantially the minimal DCF (detection cost function) of the system. Furthermore, using appropriate SRS signals made it possible to reduce the minimal DCF (5.7% relative reduction) of the system. The linear combination at the score level, with equal weights, of the baseline and the SRS-based systems could also help in reducing the minimal DCF.
Bibliographic reference. Do, Cong-Thanh / Barras, Claude (2012): "Cochlear implant-like processing of speech signal for speaker verification", In SAPA-SCALE-2012, 17-21.