ISCA Archive SSW 2021
ISCA Archive SSW 2021

Homograph disambiguation with contextual word embeddings for TTS systems

Marco Nicolis, Viacheslav Klimkov

We describe a heterophone homograph (simply ’homograph’ henceforth) disambiguation system based on per-case classifiers, trained on a small amount of labelled data. These classifiers use contextual word embeddings as input features and achieve state-of-the-art accuracy of 0.991 on the English homographs on a publicly available dataset, without any additional rule system being necessary. We show that as little as 100 sentences are sufficient to train a light-weight dedicated classifier, provided the dataset is sufficiently balanced, i.e. all versions of the homograph are adequately represented. We further add data in cases where the original dataset is deeply unbalanced (i.e. one homograph version overwhelmingly represented). This is effectively a special case of active learning, by which we select additional cases of the under-represented homograph versions and show an 11% relative improvement for such cases. We finally provide a solution to drastically reduce the size of our models, via sparsification.


doi: 10.21437/SSW.2021-39

Cite as: Nicolis, M., Klimkov, V. (2021) Homograph disambiguation with contextual word embeddings for TTS systems. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 222-226, doi: 10.21437/SSW.2021-39

@inproceedings{nicolis21_ssw,
  author={Marco Nicolis and Viacheslav Klimkov},
  title={{Homograph disambiguation with contextual word embeddings for TTS systems}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={222--226},
  doi={10.21437/SSW.2021-39}
}