EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

A Speech Model of Acoustic Inventories Based on Asynchronous Interpolation

Alexander B. Kain, Jan P.H. van Santen

Oregon Health & Science University, USA

We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthesized speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.

Full Paper

Bibliographic reference.  Kain, Alexander B. / Santen, Jan P.H. van (2003): "A speech model of acoustic inventories based on asynchronous interpolation", In EUROSPEECH-2003, 329-332.