ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

The impact of language dynamics on the capitalization of broadcast news

Fernando Batista, Nuno Mamede, Isabel Trancoso

This paper investigates the impact of language dynamics on the capitalization of transcriptions of broadcast news. Most of the capitalization information is provided by a large newspaper corpus. Three different speech corpora subsets, from different time periods, are used for evaluation, assessing the importance of available training data in nearby time periods. Results are provided both for manual and automatic transcriptions, showing also the impact of the recognition errors in the capitalization task. Our approach is based on maximum entropy models, uses unlimited vocabulary, and is suitable for language adaptation. The language model for a given language period is produced by retraining a previous language model with data from that time period. The language model produced with this approach can be sorted and then pruned, in order to reduce computational resources, without much impact in the final results.


doi: 10.21437/Interspeech.2008-68

Cite as: Batista, F., Mamede, N., Trancoso, I. (2008) The impact of language dynamics on the capitalization of broadcast news. Proc. Interspeech 2008, 220-223, doi: 10.21437/Interspeech.2008-68

@inproceedings{batista08_interspeech,
  author={Fernando Batista and Nuno Mamede and Isabel Trancoso},
  title={{The impact of language dynamics on the capitalization of broadcast news}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={220--223},
  doi={10.21437/Interspeech.2008-68}
}