ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation

Shizhen Wang, Steven M. Lulich, Abeer Alwan

In previous work [1], we proposed a speaker adaptation technique based on the second subglottal resonance (Sg2), which showed good performance relative to vocal tract length normalization (VTLN). In this paper, we propose a more reliable algorithm for automatically estimating Sg2 from speech signals. The algorithm is calibrated on children's speech data collected simultaneously with accelerometer recordings from which Sg2 frequencies can be directly measured. To investigate whether Sg2 frequencies are independent of speech content and language, we perform a cross-language study with bilingual Spanish-English children. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. We then present a cross-language speaker normalization method based on Sg2, which is computationally more efficient than maximum-likelihood based VTLN, and performs more robustly than VTLN.

s S. Wang, A. Alwan and S. M. Lulich, "Speaker normalization based on subglottal resonances," in Proc. ICASSP, pp. 4277-4280, 2008


doi: 10.21437/Interspeech.2008-383

Cite as: Wang, S., Lulich, S.M., Alwan, A. (2008) A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation. Proc. Interspeech 2008, 1717-1720, doi: 10.21437/Interspeech.2008-383

@inproceedings{wang08h_interspeech,
  author={Shizhen Wang and Steven M. Lulich and Abeer Alwan},
  title={{A reliable technique for detecting the second subglottal resonance and its use in cross-language speaker adaptation}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1717--1720},
  doi={10.21437/Interspeech.2008-383}
}