ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Structural representation of the non-native pronunciations

Satoshi Asakawa, Nobuaki Minematsu, Toshiko Isei-Jaakkola, Keikichi Hirose

Acoustic representation of speech provided by phonetics, spectrogram, is noisy representation in that it shows every acoustic aspect of speech. Age, gender, shape, microphone, room, line, etc. are completely irrelevant to the pronunciation assessment. However, the spectrogram is affected inevitably by these factors. Recently, a novel acoustic representation of speech was proposed, where dimensions of these non-linguistic factors can hardly be seen[1, 2]. Every acoustic substance of speech is discarded and only their interrelations are extracted to represent the pronunciation structurally. Using this method, individual learners were described as distorted phonemic structures[3] and automatic scoring of the pronunciation was investigated[3, 4]. This paper describes two new analyses using the proposed method. The first analysis is done to examine whether the method can trace the development of a student's pronunciation appropriately using only a limited amount of speech. The second one focuses on the prosodic aspect of the pronunciation, especially stressed and unstressed vowels. The former indicates that the proposed method can show history of the student's development adequately and the latter clarifies that size of the pronunciation structure is highly correlated with the pronunciation proficiency.

doi: 10.21437/Interspeech.2005-94

Cite as: Asakawa, S., Minematsu, N., Isei-Jaakkola, T., Hirose, K. (2005) Structural representation of the non-native pronunciations. Proc. Interspeech 2005, 165-168, doi: 10.21437/Interspeech.2005-94

  author={Satoshi Asakawa and Nobuaki Minematsu and Toshiko Isei-Jaakkola and Keikichi Hirose},
  title={{Structural representation of the non-native pronunciations}},
  booktitle={Proc. Interspeech 2005},