A method for speech signal enhancement is developed with application to automatic speaker recognition where the signals have different channel conditions. The basis of this technique is a robust pitch detection algorithm that accurately estimates the instantaneous pitch rate, and extracts single pitch period speech segments. This technique of pitch synchronized speech processing (PSSP) provides the highest time-frequency resolution for short time Fourier analysis of speech signals. It also effectively eliminates all non-voiced signal regions and minimizes the spectral harmonics due to multiple pitch periods in the analysis window. One significant benefit of PSSP is that feature warping can be applied to the pitch-synchronized spectrums for two cross-channel signals. Feature warping in the spectral domain provides linear channel normalization and enhancement for spectrographic analysis. A cross channel transfer function can then be derived from the feature warping process and applied to audio channel normalization and enhancement. The application of the PSSP feature warping transfer function resulted in improved speaker recognition performance when applied to cross-channel speech signals from the CAVIS voice corpus [1]. However, PSSP alone did not improve recognition performance compared to Mel filterbank cepstral coefficients.
Cite as: Nakasone, H., Mimikopoulos, M., Beck, S.D., Mathur, S. (2004) Pitch synchronized speech processing (PSSP) for speaker recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2004), 251-256
@inproceedings{nakasone04_odyssey, author={Hirotaka Nakasone and Maria Mimikopoulos and Steven D. Beck and Somit Mathur}, title={{Pitch synchronized speech processing (PSSP) for speaker recognition}}, year=2004, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2004)}, pages={251--256} }