10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Vocabulary Expansion Through Automatic Abbreviation Generation for Chinese Voice Search

Dong Yang, Yi-cheng Pan, Sadaoki Furui

Tokyo Institute of Technology, Japan

Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named entities and we perform vocabulary expansion using output of the abbreviation model for voice search. In our abbreviation modeling, we convert the abbreviation generation problem into a tagging problem and use the conditional random field (CRF) as the tagging tool. In the vocabulary expansion, considering the multiple abbreviation problem and limited coverage of top-1 abbreviation candidate, we add top-10 candidates into the vocabulary. In our experiments, for the abbreviation modeling, we achieved the top-10 coverage of 88.3% by the proposed method; for the voice search, we improved the voice search accuracy from 16.9% to 79.2% by incorporating the top-10 abbreviation candidates to vocabulary.

Full Paper

Bibliographic reference.  Yang, Dong / Pan, Yi-cheng / Furui, Sadaoki (2009): "Vocabulary expansion through automatic abbreviation generation for Chinese voice search", In INTERSPEECH-2009, 728-731.