15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Developing STT and KWS Systems Using Limited Language Resources

Viet-Bac Le (1), Lori Lamel (2), Abdel Messaoudi (1), William Hartmann (2), Jean-Luc Gauvain (2), Cécile Woehrling (1), Julien Despres (1), Anindya Roy (2)

(1) Vocapia Research, France
(2) LIMSI, France

This paper presents recent progress in developing speech-to-text (STT) and keyword spotting (KWS) systems for the 2014 IARPA-Babel evaluation. Systems have been developed for the limited language pack condition for four of the five development languages in this program phase: Assamese, Bengali, Haitian Creole and Zulu. The systems have several novel characteristics that support rapid development of KWS systems. On the STT side different acoustic units are explored based on phonemic or graphemic representations, and system combination is used to improve STT performance. The acoustic models are trained on only 10 hours of speech data with manual transcriptions, completed with unsupervised training on additional untranscribed data. Both word and subword units (morphologically decomposed, syllables, phonemes) are used for KWS. The KWS systems are based on the multi-hypotheses produced by a consensus network decoding or searching word lattices. The word error rates of the individual STT systems are on the order of 50–60%, and the KWS systems obtain Maximum Term Weighted Values ranging from 30–45% for all keywords (in-vocabulary and out-of-vocabulary (OOV)). Sub-word units are shown to be successful at locating some of the OOV keywords, and system combination improves system performance.

Full Paper

Bibliographic reference.  Le, Viet-Bac / Lamel, Lori / Messaoudi, Abdel / Hartmann, William / Gauvain, Jean-Luc / Woehrling, Cécile / Despres, Julien / Roy, Anindya (2014): "Developing STT and KWS systems using limited language resources", In INTERSPEECH-2014, 2484-2488.