2nd International Workshop on Speech, Language and Audio in Multimedia (SLAM2014)
Proper names are usually key to understanding the information contained in a document. Our work focuses on increasing the vocabulary coverage of a speech transcription system by automatically retrieving new proper names from contemporary diachronic text documents. The idea is to use in-vocabulary proper names as an anchor to collect new linked proper names from the diachronic corpus. Our assumption is that time is an important feature for capturing name-to-context dependencies, that was confirmed by temporal mismatch experiments. We studied a method based on Mutual Information and proposed a new method based on cosine-similarity measure that dynamically augment the automatic speech recognition system vocabulary. Recognition results show a significant reduction of the word error rate using augmented vocabulary for broadcast news transcription.
Index Terms: speech recognition, out-of-vocabulary words, proper names, vocabulary augmentation
Bibliographic reference. Illina, Irina / Fohr, Dominique / Linarès, Georges (2014): "Proper name retrieval from diachronic documents for automatic speech transcription using lexical and temporal context", In SLAM-2014, 29-33.