9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

A Reliable Technique for Detecting the Second Subglottal Resonance and Its Use in Cross-Language Speaker Adaptation

Shizhen Wang (1), Steven M. Lulich (2), Abeer Alwan (1)

(1) University of California at Los Angeles, USA; (2) MIT, USA

In previous work [1], we proposed a speaker adaptation technique based on the second subglottal resonance (Sg2), which showed good performance relative to vocal tract length normalization (VTLN). In this paper, we propose a more reliable algorithm for automatically estimating Sg2 from speech signals. The algorithm is calibrated on children's speech data collected simultaneously with accelerometer recordings from which Sg2 frequencies can be directly measured. To investigate whether Sg2 frequencies are independent of speech content and language, we perform a cross-language study with bilingual Spanish-English children. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. We then present a cross-language speaker normalization method based on Sg2, which is computationally more efficient than maximum-likelihood based VTLN, and performs more robustly than VTLN.


  1. S. Wang, A. Alwan and S. M. Lulich, "Speaker normalization based on subglottal resonances," in Proc. ICASSP, pp. 4277-4280, 2008

