8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


The Effect of an Intermediate Articulatory Layer on the Performance of a Segmental HMM

Martin J. Russell (1), Philip J.B. Jackson (2)

(1) University of Birmingham, U.K.
(2) University of Surrey, U.K.

We present a novel multi-level HMM in which an intermediate `articulatory' representation is included between the state and surface-acoustic levels. A potential difficulty with such a model is that advantages gained by the introduction of an articulatory layer might be compromised by limitations due to an insufficiently rich articulatory representation, or by compromises made for mathematical or computational expediency. This paper describes a simple model in which speech dynamics are modelled as linear trajectories in a formant-based `articulatory' layer, and the articulatory-to-acoustic mappings are linear. Phone classification results for TIMIT are presented for monophone and triphone systems with a phone-level syntax. The results demonstrate that provided the intermediate representation is sufficiently rich, or a sufficiently large number of phone-class-dependent articulatory-to-acoustic mapping are employed, classification performance is not compromised. Presentamos un nuevo HMM multinivel en el que una representacion `articulatoria' intermedia se incluye entre el nivel de estados y el acustico de superficie. Una dificultad potencial con tal modelo es que las ventajas ganadas por la introduccion de una capa articulatoria quizas sean cedidas por limitaciones debidas a una representacion articulatoria insuficientemente rica, o por cesiones realizadas por conveniencia matematica o computacional. Este articulo describe un modelo sencillo en el cual la dinamica del habla se modela como trayectorias lineales en una capa articulatoria basada en formantes, y las proyecciones acustico-articulatorias son lineales. Los resultados de la clasificacion de fonemas para TIMIT se presentan para sistemas de monofonemas y trifonemas con una sintaxis a nivel de fonema. Los resultados demuestran que la representacion intermedia es suficientemente rica, o se emplea un numero suficientemente grande de proyecciones acustico-articulatorias dependiente de la clase de fonema, donde no se comprometen las prestaciones de la clasificacion.

Full Paper

Bibliographic reference.  Russell, Martin J. / Jackson, Philip J.B. (2003): "The effect of an intermediate articulatory layer on the performance of a segmental HMM", In EUROSPEECH-2003, 2737-2740.