Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning

Mostafa Shahin, Julien Epps, Beena Ahmed


Prosodic features are important for the intelligibility and proficiency of stress-timed languages such as English and Arabic. Producing the appropriate lexical stress is challenging for second language (L2) learners, in particular, those whose first language (L1) is a syllable-timed language such as Spanish, French, etc. In this paper we introduce a method for automatic classification of lexical stress to be integrated into computer-aided pronunciation learning (CAPL) tools for L2 learning. We trained two different deep learning architectures, the deep feedforward neural network (DNN) and the deep convolutional neural network (CNN) using a set of temporal and spectral features related to the intensity, duration, pitch and energies in different frequency bands. The system was applied on both English (kids and adult) and Arabic (adult) speech corpora collected from native speakers. Our method results in error rates of 9%, 7% and 18% when tested on the English children corpus, English adult corpus and Arabic adult corpus respectively.


DOI: 10.21437/Interspeech.2016-644

Cite as

Shahin, M., Epps, J., Ahmed, B. (2016) Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning. Proc. Interspeech 2016, 175-179.

Bibtex
@inproceedings{Shahin+2016,
author={Mostafa Shahin and Julien Epps and Beena Ahmed},
title={Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-644},
url={http://dx.doi.org/10.21437/Interspeech.2016-644},
pages={175--179}
}