INTERSPEECH 2009
10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Large-Scale Analysis of Formant Frequency Estimation Variability in Conversational Telephone Speech

Nancy F. Chen (1), Wade Shen (1), Joseph Campbell (1), Reva Schwartz (2)

(1) MIT, USA
(2) United States Secret Service, USA

We quantify how the telephone channel and regional dialect influence formant estimates extracted from Wavesurfer [1, 2] in spontaneous conversational speech from over 3,600 native American English speakers. To the best of our knowledge, this is the largest scale study on this topic. We found that F1 estimates are higher in cellular channels than those in landline, while F2 in general shows an opposite trend. We also characterized vowel shift trends in northern states in U.S.A. and compared them with the Northern city chain shift (NCCS) [3]. Our analysis is useful in forensic applications where it is important to distinguish between speaker, dialect, and channel characteristics.

References

  1. Snack Sound Toolkit: http://www.speech.kth.se/snack/
  2. Talkin, D., “Speech Formant Trajectory Estimation using Dynamic Programming with Modulated Transition Costs”, J. Acoust. Soc. Am., S1, 1987, pp. S55.
  3. Labov, W., Ash, S., and Boberg, C.,“The Atlas of North American English: Phonetics, Phonology, and Sound Change”, Mouton de Gruyter, Berlin, 2006.

Full Paper

Bibliographic reference.  Chen, Nancy F. / Shen, Wade / Campbell, Joseph / Schwartz, Reva (2009): "Large-scale analysis of formant frequency estimation variability in conversational telephone speech", In INTERSPEECH-2009, 2203-2206.