8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Articulatory Feature-Based Conditional Pronunciation Modeling for Speaker Verification

Ka-Yee Leung (1), Man-Wai Mak (1), Sun-Yuan Kung (2)

(1) The Hong Kong Polytechnic University, Hong Kong
(2) Princeton University, USA

Due to the differences in education background, accents, etc., different individuals have their unique way of pronunciation. This paper exploits the pronunciation characteristics of speakers and proposes a new conditional pronunciation modeling (CPM) technique for speaker verification. The proposed technique aims to establish a link between articulatory properties (such as manners and places of articulation) and phoneme sequences produced by a speaker. This is achieved by aligning two articulatory feature (AF) streams with a phoneme sequence determined by a phoneme recognizer, and formulating the probabilities of articulatory classes conditioned on the phonemes as speaker-dependent probabilistic models. The scores obtained from the AF-based pronunciation models are then fused with those obtained from a spectral-based speaker verification system, with the frame-by-frame fused scores weighted by the confidence of the pronunciation models. Evaluations based on the SPIDRE corpus demonstrate that AF-based CPM systems can recognize speakers even with short utterances and are readily combined with spectral-based systems to further enhance the reliability of speaker verification.

Full Paper

Bibliographic reference.  Leung, Ka-Yee / Mak, Man-Wai / Kung, Sun-Yuan (2004): "Articulatory feature-based conditional pronunciation modeling for speaker verification", In INTERSPEECH-2004, 2597-2600.