Automatic Detection of Prosodic Focus in American English

Sunghye Cho, Mark Liberman, Yong-cheol Lee

Focus, which is usually modulated by prosodic prominence, highlights a particular element within a sentence for emphasis or contrast. Despite its importance in communication, it has received little attention in the field of speech recognition. This paper developed an automatic detection system of prosodic focus in American English, using telephone-number strings. Our data were 100 10-digit phone number strings read by 5 speakers (3 females and 2 males). We extracted 18 prosodic features from each digit within the strings and one categorical variable and trained a Random Forest model to detect where the focused digit is within a given string. We also compared the model performance to human judgment rates from a perception experiment with 67 native speakers of American English. Our final model shows 92% of accuracy in detecting the location of prosodic focus, which is slightly lower than the human perception (97.2%) but much better than the chance level (10%). We discuss the predictive features in our model and potential features to add in the future study.

 DOI: 10.21437/Interspeech.2019-1668

Cite as: Cho, S., Liberman, M., Lee, Y. (2019) Automatic Detection of Prosodic Focus in American English. Proc. Interspeech 2019, 3470-3474, DOI: 10.21437/Interspeech.2019-1668.

  author={Sunghye Cho and Mark Liberman and Yong-cheol Lee},
  title={{Automatic Detection of Prosodic Focus in American English}},
  booktitle={Proc. Interspeech 2019},