EUROSPEECH 2003 - INTERSPEECH 2003
This paper proposes a new time alignment method between scenario and sounds with voice, music and BGM (Back Ground Music) in order to generate video caption automatically. The proposed time alignment method, Voice-Music-Pause+BGM method, is based on the composition of voice and music models. The result of the experiments to evaluate the proposed method shows the proposed method works about 10~60 times better than the conventional time alignment methods.
Bibliographic reference. Wada, Yamato / Sugiyama, Masahide (2003): "Time alignment for scenario and sounds with voice, music and BGM", In EUROSPEECH-2003, 445-448.