Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Heredity and Environment in Speech Recognition: The Role of A Priori Information vs. Data

Michael A. Picheny

Human Language Technologies Group, IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

Most significant advances in speech recognition over the last thirty years can be attributed to the easy availability of everincreasing corpora of speech and language data and the development of simple trainable parametric statistical models that take advantage of this data. Hidden Markov Models, n-gram language models, and linear-discriminant based feature extraction are all examples of such data-driven algorithms. However, there is a general feeling in the recognition community that there is a large untapped body of knowledge encompassing a priori sources of information in speech and language that can be mined to serve as the basis for the next generation of improvements in speech recognition systems. Such sources of information include constraints imposed by articulatory models, the grammatical structure of language, and phonology. This paper reviews previous abortive attempts to utilize a priori information in speech recognition and contrasts them with data-driven approaches that seem to more successfully capture information of a similar nature. It also highlights some recent attempts to incorporate explicit sources of speech and language knowledge and speculates on possibilities for synergy between the two approaches in the future.

Full Paper

Bibliographic reference.  Picheny, Michael A. (2000): "Heredity and environment in speech recognition: the role of a priori information vs. data", In ICSLP-2000, vol.3, 429-433.