Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages

Zixiaofan Yang, Julia Hirschberg


Acoustic word embeddings have been proven to be useful in query-by-example keyword search. Such embeddings are typically trained to distinguish the same word from a different word using exact orthographic representations; so, two different words will have dissimilar embeddings even if they are pronounced similarly or share the same stem. However, in real-world applications such as keyword search in low-resource languages, models are expected to find all derived and inflected forms for a certain keyword. In this paper, we address this mismatch by incorporating linguistic information when training neural acoustic word embeddings. We propose two linguistically-informed methods for training these embeddings, both of which, when we use metrics that consider non-exact matches, outperform state-of-the-art models on the Switchboard dataset. We also present results on Sinhala to show that models trained on English can be directly transferred to embed spoken words in a very different language with high accuracy.


 DOI: 10.21437/Interspeech.2019-3119

Cite as: Yang, Z., Hirschberg, J. (2019) Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages. Proc. Interspeech 2019, 2678-2682, DOI: 10.21437/Interspeech.2019-3119.


@inproceedings{Yang2019,
  author={Zixiaofan Yang and Julia Hirschberg},
  title={{Linguistically-Informed Training of Acoustic Word Embeddings for Low-Resource Languages}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2678--2682},
  doi={10.21437/Interspeech.2019-3119},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3119}
}