ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Speech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs

Eli Yamamoto, Satoshi Nakamura, Kiyohiro Shikano

This paper proposes a method to re-estimate output visual parameters for speech-to-lip movement synthesis using audio-visual Hidden Markov Models (HMMs) under the Expectation-Maximization(EM) algorithm. In the conventional methods for speech-to-lip movement synthesis, there is a synthesis method estimating a visual parameter sequence through the Viterbi alignment of an input acoustic speech signal using audio HMMs. However, the HMM-Viterbi method involves a substantial problem that incorrect HMM state alignment may output incorrect visual parameters. The problem in the HMM-Viterbi method is caused by the deterministic synthesis process to assign a single HMM state for an input audio frame. The proposed method avoids the deterministic process by re-estimating non-deterministic visual parameters while maximizing the likelihood of the audio-visual observation sequence under the EM algorithm.


doi: 10.21437/ICSLP.1998-274

Cite as: Yamamoto, E., Nakamura, S., Shikano, K. (1998) Speech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0756, doi: 10.21437/ICSLP.1998-274

@inproceedings{yamamoto98_icslp,
  author={Eli Yamamoto and Satoshi Nakamura and Kiyohiro Shikano},
  title={{Speech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0756},
  doi={10.21437/ICSLP.1998-274}
}