Recently, bottleneck features (BNF) with an i-Vector strategy has been used for state-of-the-art language/dialect identification. However, traditional bottleneck extraction requires an additional transcribed corpus which is used for acoustic modeling. Alternatively, an unsupervised BNF extraction diagram is proposed in our study, which is derived from the traditional structure but trained with an estimated phonetic label. The proposed method is evaluated on a 4-way Chinese dialect dataset and a 5-way closely spaced Pan-Arabic corpus. Compared to a baseline i-Vector system based on acoustic features MFCCs, the proposed unsupervised BNF consistently achieves better performance across two corpora. Specifically, the EER and overall performance C_avg * 100 are improved by a relative +48% and +52%, respectively. Even under the condition with limited training data, the proposed feature still achieves up to 24% relative improvement compared to baseline, all without the need of a secondary transcribed corpus.
Cite as: Zhang, Q., Hansen, J.H.L. (2017) Dialect Recognition Based on Unsupervised Bottleneck Features. Proc. Interspeech 2017, 2576-2580, doi: 10.21437/Interspeech.2017-576
@inproceedings{zhang17i_interspeech, author={Qian Zhang and John H.L. Hansen}, title={{Dialect Recognition Based on Unsupervised Bottleneck Features}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2576--2580}, doi={10.21437/Interspeech.2017-576} }