1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages
Porto Salvo, Portugal
This paper shows experimental results concerning automatic enrichment of the speech recognition output with punctuation marks and capitalization information. The two tasks are treated as two classification problems, using a maximum entropy modeling approach. The approach is language independent as reinforced by experiments performed on Portuguese and Spanish Broadcast News corpora. The discriminative models are trained for a language using spoken and written corpora from that language. This paper provides the first results on Spanish Broadcast News data and the first comparative study between Portuguese and Spanish, on this subject.
Index Terms: Rich Transcription, Capitalization, Punctuation marks, Speech processing
Bibliographic reference. Batista, Fernando / Trancoso, Isabel / Mamede, Nuno (2009): "Automatic recovery of punctuation marks and capitalization information for Iberian languages", In SLTECH-2009, 99-102.