In this paper, we propose a method of formant prediction from pole and bandwidth data, and apply this method to automatically extract F1 and F2 values from a corpus of regional dialect variation in North America that contains 134,000 manual formant measurements. These predicted formants are shown to increase performance over the default formant values from a popular speech analysis package. Finally, we demonstrate that sociolinguistic analysis based on vowel formant data can be conducted reliably using the automatically predicted values, and we argue that sociolinguists should begin to use this methodology in order to be able to analyze larger amounts of data efficiently.
Bibliographic reference. Evanini, Keelan / Isard, Stephen / Liberman, Mark (2009): "Automatic formant extraction for sociolinguistic analysis of large corpora", In INTERSPEECH-2009, 1655-1658.