Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named entities and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we convert the abbreviation generation problem into a tagging problem and use the conditional random field (CRF) as the tagging tool. In the vocabulary expansion, considering the multiple abbreviation problem and limited coverage of top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved the top-10 coverage of 88.3% by the proposed method; for the voice search, we improved the voice search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to vocabulary.
Bibliographic reference. Yang, Dong / Pan, Yi-cheng / Furui, Sadaoki (2009): "Vocabulary expansion through automatic abbreviation generation for Chinese voice search", In INTERSPEECH-2009, 728-731.