ODYSSEY 2004 - The Speaker and Language Recognition Workshop

May 31 - June 3, 2004
Toledo, Spain

Pitch Synchronized Speech Processing (PSSP) for Speaker Recognition

Hirotaka Nakasone (1), Maria Mimikopoulos (1), Steven D. Beck (2), Somit Mathur (2)

(1) Federal Bureau of Investigation, Forensic Audio/Video and Image Analysis Unit, Engineering Research Facility, Quantico, VA, USA
(2) BAE Systems Integrated Defense Solutions Austin, TX, USA

A method for speech signal enhancement is developed with application to automatic speaker recognition where the signals have different channel conditions. The basis of this technique is a robust pitch detection algorithm that accurately estimates the instantaneous pitch rate, and extracts single pitch period speech segments. This technique of pitch synchronized speech processing (PSSP) provides the highest time-frequency resolution for short time Fourier analysis of speech signals. It also effectively eliminates all non-voiced signal regions and minimizes the spectral harmonics due to multiple pitch periods in the analysis window. One significant benefit of PSSP is that feature warping can be applied to the pitch-synchronized spectrums for two cross-channel signals. Feature warping in the spectral domain provides linear channel normalization and enhancement for spectrographic analysis. A cross channel transfer function can then be derived from the feature warping process and applied to audio channel normalization and enhancement. The application of the PSSP feature warping transfer function resulted in improved speaker recognition performance when applied to cross-channel speech signals from the CAVIS voice corpus [1]. However, PSSP alone did not improve recognition performance compared to Mel filterbank cepstral coefficients.

Full Paper

Bibliographic reference.  Nakasone, Hirotaka / Mimikopoulos, Maria / Beck, Steven D. / Mathur, Somit (2004): "Pitch synchronized speech processing (PSSP) for speaker recognition", In ODYS-2004, 251-256.