ISCA Archive ASRIV 1994
ISCA Archive ASRIV 1994

Robust speech modeling for speaker identification in forensic acoustics

Javier Ortega-Garcia, Joaquin Gonzalez-Rodriguez

Speaker identification is one of the most important topics included in the field of Forensic Acoustics. Nevertheless, classical forensic applications of speaker identification methods have had a great quantity of lacks and problems associated with. The use of non-automatic or semiautomatic evaluation processes, usually realized by means of trained staff and, therefore, under an aural-perceptual subjective identification perspective has urged forensic acoustics research towards new fully automatic evaluation techniques.

Together with this main task, the poor quality of many of the recordings available (including effects like Lombard speech, cocktail party noise or reverberant speech), the non-cooperative nature of talkers (knowing that "anything the may say can be used against them") that generates disguised speech, monosyllabic responses, etc., or the non-controlled noisy situations in which recordings have to be taken, add to speaker recognition for forensic acoustics purposes a "noisy" component that differentiates it from classical or laboratory applications. All these "noisy" components will be added to our clean speech, producing as result degraded or noisy speech.

In section 2, in order to avoid the problem that we have already called noisy speech, a speech enhancement front-end including both single-channel and multi-channel approaches is described. Single-channel approach is intended when no references of the noisy source are available -in these cases, speech enhancement is accomplished with techniques as spectral subtraction or classical filtering. Multichannel approach, needs at least one correlated reference of the noisy source and, in this other case, techniques as adaptive filtering can be used.

On the other side, the use of analysis tools and evaluation techniques (either in time, spectral or cepstral domains) can be effective only if a model of the speech production mechanism can be made [3]. The use of Continuous Density Hidden Markov Models (CDHMM) has proved to be a powerful way of modeling speech utterances, so intraspeaker idiosyncratic factors can be modeled and compared through objective methods and without the subjectivism of human perception. In section 3, together with this, some speaker identification experiments in diverse modeling contexts are also proposed.

In section 4, results of several identification experiments in different S/N situations are presented. Together with this, results of another set of experiments including speech enhancement techniques, proposed in section 2, are also presented.

Cite as: Ortega-Garcia, J., Gonzalez-Rodriguez, J. (1994) Robust speech modeling for speaker identification in forensic acoustics. Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, 217-220

  author={Javier Ortega-Garcia and Joaquin Gonzalez-Rodriguez},
  title={{Robust speech modeling for speaker identification in forensic acoustics}},
  booktitle={Proc. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification},