Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi

Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, Morgan Sonderegger


We present the Montreal Forced Aligner (MFA), a new open-source system for speech-text alignment. MFA is an update to the Prosodylab-Aligner, and maintains its key functionality of trainability on new data, as well as incorporating improved architecture (triphone acoustic models and speaker adaptation), and other features. MFA uses Kaldi instead of HTK, allowing MFA to be distributed as a stand-alone package, and to exploit parallel processing for computationally-intensive training and scaling to larger datasets. We evaluate MFA’s performance on aligning word and phone boundaries in English conversational and laboratory speech, relative to human-annotated boundaries, focusing on the effects of aligner architecture and training on the data to be aligned. MFA performs well relative to two existing open-source aligners with simpler architecture (Prosodylab-Aligner and FAVE), and both its improved architecture and training on data to be aligned generally result in more accurate boundaries.


 DOI: 10.21437/Interspeech.2017-1386

Cite as: McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., Sonderegger, M. (2017) Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi. Proc. Interspeech 2017, 498-502, DOI: 10.21437/Interspeech.2017-1386.


@inproceedings{McAuliffe2017,
  author={Michael McAuliffe and Michaela Socolof and Sarah Mihuc and Michael Wagner and Morgan Sonderegger},
  title={Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={498--502},
  doi={10.21437/Interspeech.2017-1386},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1386}
}