This study aims at developing an automatic system for measuring speech fluency in a second language (L2). Eighteen learners of French, all of them native speakers of English, were recorded during read-aloud tasks in French. Six native-French speakers with a background in L2 acquisition and phonetics rated the recordings in terms of speech fluency. Automatic measures of speech fluency were computed following four consecutive steps. First, (1) the forward-backward divergence segmentation (FBDS) algorithm was used to segment speech recordings into subphonemic units. Then, (2) the FBDS-derived segments were automatically clustered into higher-level units: pseudo syllables and silent breaks. (3) Four predictors of speech fluency were computed: pseudo-syllable rate, standard deviation of pseudo-syllable duration, rate of silent breaks, and percentage of speech. Finally, (4) the four predictors were combined together using either a multiple linear regression (MLR) or a neural network (NN) to predict human ratings of speech fluency. A very strong correlation (R = 0.89) between the NN-based automatic scores and the average human ratings is achieved. The correlation coefficient achieved with the MLR is significantly lower (R = 0.85), but a ten-fold cross-validation indicates similar performances for the two models with regards to their behavior on unknown data.
Cite as: Fontan, L., Coz, M.L., Alazard, C. (2020) Using the forward-backward divergence segmentation algorithm and a neural network to predict L2 speech fluency. Proc. Speech Prosody 2020, 925-929, doi: 10.21437/SpeechProsody.2020-189
@inproceedings{fontan20_speechprosody, author={Lionel Fontan and Maxime Le Coz and Charlotte Alazard}, title={{Using the forward-backward divergence segmentation algorithm and a neural network to predict L2 speech fluency}}, year=2020, booktitle={Proc. Speech Prosody 2020}, pages={925--929}, doi={10.21437/SpeechProsody.2020-189} }