 |
ESCA Workshop on Audio-Visual Speech Processing (AVSP'97)
September 26-27, 1997
Rhodes, Greece |
 |
Audio Visual Speech Recognition and Segmental Master Slave HMM
Regine André-Obrecht, Bruno Jacob, Nathalie Parlangeau
IRIT-University Paul Sabatier-CNRS UMR 5505, Toulouse, France
Our work deals with the classical problem of merging
heterogenous and asynchronous parameters. It's well known
that lips reading improves the speech recognition score,
specially in noise condition ; so we study more precisely the
modeling of acoustic and labial parameters to propose two
Automatic Speech Recognition Systems:
- a Direct Identification is performed by using a classical
HMM approach: no correlation between visual and acoustic
parameters is assumed.
- two correlated models : a master HMM and a slave HMM,
process respectively the labial observations and the acoustic
To assess each approach, we use a segmental pre-processing
and an acoustic robust elementary unit "the pseudodiphone".
Our task is the recognition of spelled french
letters, in clear and noisy ( cocktail party ) environments.
Whatever the approach and condition, the introduction of
labial features improves the performances, but the difference
between the two models isn't enough sufficient to provide
any priority.
Full Paper
Bibliographic reference.
André-Obrecht, Regine / Jacob, Bruno / Parlangeau, Nathalie (1997):
"Audio visual speech recognition and segmental master slave HMM",
In AVSP-1997, 49-52.