A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models

Xue Bai, Jun Du, Zi-Rui Wang, Chin-Hui Lee


For the acoustic scenes classification, the main challenge is distinguishing similar acoustic segments between different scenes. To solve this problem, many deep learning based approaches have been proposed without considering the relevance of different acoustic scenes. In this paper, we propose a novel acoustic segment model (ASM) for acoustic scene classification. ASM aims at giving finer segmentation and covering all acoustic scenes through searching for the underlying phoneme like acoustic units. Furthermore, acoustic segments are modeled by Hidden Markov Models (HMMs) and each audio is decoded into ASM sequences without prior linguistic knowledge. Similar to the term vector of a text document, these ASM sequences are converted into co-occurrence statistics feature vectors and SVM/DNN is used as classifier back-end. Validated on the DCASE 2018 task, the proposed approach can achieve a competitive performance with single model and no data augment. By using visualization analysis, we excavate the potential similar units hidden in auditory sense.


 DOI: 10.21437/Interspeech.2019-2171

Cite as: Bai, X., Du, J., Wang, Z., Lee, C. (2019) A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models. Proc. Interspeech 2019, 3619-3623, DOI: 10.21437/Interspeech.2019-2171.


@inproceedings{Bai2019,
  author={Xue Bai and Jun Du and Zi-Rui Wang and Chin-Hui Lee},
  title={{A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3619--3623},
  doi={10.21437/Interspeech.2019-2171},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2171}
}