Recently, speaker verification systems using different kinds of prosodic features have been proposed. Although it has been shown that most of these speaker verification systems can improve system performance using score-level fusion with state-of-the-art cepstral-based systems, a systematic comparison of the prosodic modelling algorithms used in these prosodic systems has not yet been performed. This motivated us to review the proposed prosodic modelling algorithms and compare them using a common experimental condition.
These experiments explored different approaches in the sampling/ segmentation of prosodic contours and the selection of prosodic features. They show that simple prosodic systems with features extracted from fixed-size contour segments, without knowledge of phone/pseudo-syllable level information, still provide significant performance improvement when fused with a state-of-the-art cepstral-based system. Moreover, some prosodic systems are shown to be complementary to each other. Fusion of these systems with the cepstral-based system can provide further performance improvement on the speaker verification task.
Bibliographic reference. Leung, Cheung-Chi / Ferras, Marc / Barras, Claude / Gauvain, Jean-Luc (2008): "Comparing prosodic models for speaker recognition", In INTERSPEECH-2008, 1945-1948.