Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training

Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono


This paper presents a novel question detection method from natural speech using acoustic and phonetic features. The conventional methods based on Recurrent Neural Networks (RNNs) use only acoustic features. However, lexical cues are essential to identify some questions such as declarative questions. To this end we propose a new RNN-based question detection model which utilizes both acoustic and lexical information. Phonetic features which are suitable to describe interrogative cues are used as lexical information. Furthermore, we also propose a new training framework named feature-wise pre-training (FP) to combine the acoustic and phonetic features effectively. FP attempts to acquire interrogative cues in individual features instead of the combination of the features, which makes the model training more stable. The estimation models of the interrogatives are then integrated and fine-tuning is applied to obtain the unified comprehensive model. Experiments show that the proposed method offers better performance than the conventional benchmarks.


 DOI: 10.21437/Interspeech.2018-1755

Cite as: Ando, A., Asakawa, R., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y. (2018) Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training. Proc. Interspeech 2018, 1731-1735, DOI: 10.21437/Interspeech.2018-1755.


@inproceedings{Ando2018,
  author={Atsushi Ando and Reine Asakawa and Ryo Masumura and Hosana Kamiyama and Satoshi Kobashikawa and Yushi Aono},
  title={Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1731--1735},
  doi={10.21437/Interspeech.2018-1755},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1755}
}