ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Hierarchical Phone Recognition with Compositional Phonetics

Xinjian Li, Juncheng Li, Florian Metze, Alan W. Black

There is growing interest in building phone recognition systems for low-resource languages as the majority of languages do not have any writing systems. Phone recognition systems proposed so far typically derive their phone inventory from the training languages, therefore the derived inventory could only cover a limited number of phones existing in the world. It fails to recognize unseen phones in low-resource or zero-resource languages. In this work, we tackle this problem with a hierarchical model, in which we explicitly model three different entities in a hierarchical manner: phoneme, phone, and phonological articulatory attributes. In particular, we decompose phones into articulatory attributes and compute the phone embedding from the attribute embedding. The model would first predict the distribution over the phones using their embeddings, next, the language-independent phones are aggregated to the language-dependent phonemes and then optimized by the CTC loss. This compositional approach enables us to recognize phones even they do not appear in the training set. We evaluate our model on 47 unseen languages and find the proposed model outperforms baselines by 13.1% PER.

doi: 10.21437/Interspeech.2021-1803

Cite as: Li, X., Li, J., Metze, F., Black, A.W. (2021) Hierarchical Phone Recognition with Compositional Phonetics. Proc. Interspeech 2021, 2461-2465, doi: 10.21437/Interspeech.2021-1803

  author={Xinjian Li and Juncheng Li and Florian Metze and Alan W. Black},
  title={{Hierarchical Phone Recognition with Compositional Phonetics}},
  booktitle={Proc. Interspeech 2021},