Speech processing encompasses a variety of technologies that automatically process speech for some downstream processing. These technologies include identifying the language or dialect spoken, the person speaking, what is said and how it is said. The downstream processing may be limited to a transcription or to a transcription enhanced with additional metadata, or may be used to carry out an action or interpreted within a spoken dialog system or more generally for analytics. With the availability of large spoken multimedia or multimodal data there is growing interest in using such technologies to provide structure and random access to particular segments. Automatic tools can also serve to annotate large corpora for exploitation in linguistic studies of spoken language, such as acoustic-phonetics, pronunciation variation and diachronic evolution, permitting the validation of hypotheses and models. In this talk I will present some of my experience with speech processing in multiple languages, drawing upon progress in the context of several research projects, most recently the Quaero program and the IARPA Babel program, both of which address the development of technologies in a variety of languages, with the aim to some highlight recent research directions and challenges.
Bibliographic reference. Lamel, Lori (2014): "Language diversity: speech processing in a multi-lingual context", In INTERSPEECH-2014 (abstract).