11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

A Procedure for Estimating Gestural Scores from Natural Speech

Hosung Nam (1), Vikramjit Mitra (2), Mark Tiede (1), Elliot Saltzman (1), Louis Goldstein (3), Carol Espy-Wilson (2), Mark Hasegawa-Johnson (4)

(1) Haskins Laboratories, USA
(2) University of Maryland, USA
(3) University of Southern California, USA
(4) University of Illinois at Urbana-Champaign, USA

Speech can be represented as a constellation of constricting events, gestures, which are defined at vocal tract variables, in a form of gestural score. Gestures and their output trajectories, tract variables, which are available only in synthetic speech, have recently been shown to improve the ASR performance. We introduce a procedure to annotate gestures on natural speech database, a landmark-based time warping method. For a given speech, Haskins Laboratories TADA model is used to generate a gestural score and acoustic output, and an optimal gestural score is estimated through iterative time-warping processes based on landmark (phone) comparison.

Full Paper

Bibliographic reference.  Nam, Hosung / Mitra, Vikramjit / Tiede, Mark / Saltzman, Elliot / Goldstein, Louis / Espy-Wilson, Carol / Hasegawa-Johnson, Mark (2010): "A procedure for estimating gestural scores from natural speech", In INTERSPEECH-2010, 30-33.