EUROSPEECH 2003 - INTERSPEECH 2003
We propose a speech model that describes acoustic inventories of concatenative synthesizers. The model has the following characteristics: (i) very compact representations and thus high compression ratios are possible, (ii) re-synthesized speech is free of concatenation errors, (iii) the degree of articulation can be controlled explicitly, and (iv) voice transformation is feasible with relatively few additional recordings of a target speaker. The model represents a speech unit as a synthesis of several types of features, each of which has been computed using non-linear, asynchronous interpolation of neighboring basis vectors associated with known phonemic identities. During analysis, basis vectors and transition weights are estimated under a strict diphone assumption using a dynamic time warping approach. During synthesis, the estimated transition weight values are modified to produce changes in duration and articulation effort.
Bibliographic reference. Kain, Alexander B. / Santen, Jan P.H. van (2003): "A speech model of acoustic inventories based on asynchronous interpolation", In EUROSPEECH-2003, 329-332.