Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Speech Recognition Using Context Conditional Word Posterior Probabilities
Ralf Schlüter, Frank Wessel, Hermann Ney
Lehrstuhl für Informatik VI, Computer Science Department,
Aachen University of Technology, Germany
In this paper two new scoring schemes for large vocabulary continuous
speech recognition are compared. Instead of using the
joint probability of a word sequence and a sequence of acoustic
observations, we determine the best path through a word graph
using posterior word probabilities with or without word context.
The exact calculation of the posterior probability for a word sequence
implies a sum over all possible word boundaries, which
is approximated by a maximum operation in the standard scoring
approach. The new scoring scheme using word posterior probabilities
could be expected to lead to improved recognition performance,
because it involves partial summation over word boundaries.
We present experimental results on five different corpora,
the Dutch Arise corpus, the German Verbmobil ’98 corpus, the
English North American Business ’94 20k and 64k development
corpora, and the English Broadcast News ’96 corpus. It is shown
that the Viterbi approximation within words has no effect on standard
and word posterior based recognition. Using word posterior
probabilities with and without word context, the relative reduction
in word error rate is comparable and ranges between 1.5%
and 5%. A reason why the additional consideration of word context
does not further improve the recognition performance might
be that the increase in word context information is traded against
a decrease in the number of word sequences that contributes to a
particular word posterior probability.
Schlüter, Ralf / Wessel, Frank / Ney, Hermann (2000):
"Speech recognition using context conditional word posterior probabilities",
In ICSLP-2000, vol.2, 923-926.