![]() |
ITRW on
|
![]() |
This paper presents a technique for dynamically extending the language model lexicon of an Italian broadcast news transcription system. New words are selected dayby- day, from contemporary news available on the Internet, according to a strategy that tries to minimize the out-of-vocabulary rate of the language model. Phonetic transcriptions of new words are generated automatically with an in-house developed software tool. Experiments, performed with the ITC-irst 62K-word baseline system, show that using approximate phonetic transcriptions for less frequent words does not impact on recognition performance. Lexicon extension up to 122K words were evaluated on 19 news programs, spanning over one month, for a total of 6 hours of speech. The best lexicon extension strategy permitted to reduce the out-ofvocabulary rate by 61.8%, from 1.57% to 0.60%, and the word error rate by 2.16%, from 25.03% to 24.49%.
Bibliographic reference. Bertoldi, Nicola / Federico, Marcello (2001): "Lexicon adaptation for broadcast news transcription", In Adaptation-2001, 187-190.