Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine

Bo-Hsiang Tseng, Sheng-syun Shen, Hung-Yi Lee, Lin-Shan Lee


Multimedia or spoken content presents more attractive information than plain text content, but it’s more difficult to display on a screen and be selected by a user. As a result, accessing large collections of the former is much more difficult and time-consuming than the latter for humans. It’s highly attractive to develop a machine which can automatically understand spoken content and summarize the key information for humans to browse over. In this endeavor, we propose a new task of machine comprehension of spoken content. We define the initial goal as the listening comprehension test of TOEFL, a challenging academic English examination for English learners whose native language is not English. We further propose an Attention-based Multi-hop Recurrent Neural Network (AMRNN) architecture for this task, achieving encouraging results in the initial tests. Initial results also have shown that word-level attention is probably more robust than sentence-level attention for this task with ASR errors.


DOI: 10.21437/Interspeech.2016-876

Cite as

Tseng, B., Shen, S., Lee, H., Lee, L. (2016) Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine. Proc. Interspeech 2016, 2731-2735.

Bibtex
@inproceedings{Tseng+2016,
author={Bo-Hsiang Tseng and Sheng-syun Shen and Hung-Yi Lee and Lin-Shan Lee},
title={Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-876},
url={http://dx.doi.org/10.21437/Interspeech.2016-876},
pages={2731--2735}
}