ISCA Archive SSW 2021
ISCA Archive SSW 2021

Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French

Jason Taylor, S├ębastien Le Maguer, Korin Richmond

Sequence-to-sequence (S2S) TTS models like Tacotron have grapheme-only inputs when trained fully end-to-end. Grapheme inputs map to phone sounds depending on context, which traditionally is handled by extensive preprocessing in the TTS front-end. However, French orthography does not provide a clear one-to-one mapping between graphemes and sounds, and in English, which similarly has rather non-phonetic orthography, pronunciations are a significant cause of error in S2STTS with grapheme-inputs. In this paper, we test implicit pronunciation knowledge where graphemes do not map directly to phones. Implicit pronunciation knowledge learnt in S2S-TTS is similar to a standalone grapheme-to-phoneme (G2P) model, which makes explicit phone predictions at the sequential level. We find grapheme-input S2S-TTS makes implicit pronunciation errors similar to explicit G2P models - notably for foreign names. In a traditional front-end pipeline, there are also postlexical rules which modify G2P output at the sequential level. In French, post-lexical rules require a deep knowledge of linguistic structure in a process called Liaison. Without explicit rules, we find S2S-TTS with grapheme-inputs over-inserts Liaison sounds, leading to a significant preference for a phonebased equivalent. By testing with linguistically-motivated stimuli, we observe differences that would otherwise go undetected.


doi: 10.21437/SSW.2021-34

Cite as: Taylor, J., Maguer, S.L., Richmond, K. (2021) Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French. Proc. 11th ISCA Speech Synthesis Workshop (SSW 11), 195-199, doi: 10.21437/SSW.2021-34

@inproceedings{taylor21_ssw,
  author={Jason Taylor and S├ębastien Le Maguer and Korin Richmond},
  title={{Liaison and Pronunciation Learning in End-to-End Text-to-Speech in French}},
  year=2021,
  booktitle={Proc. 11th ISCA Speech Synthesis Workshop (SSW 11)},
  pages={195--199},
  doi={10.21437/SSW.2021-34}
}