INTERSPEECH 2006 - ICSLP
The automatic assignment of anchoring points is proposed to define the correspondence between the time-frequency representations of speech samples for speech morphing, speech texture mapping, and so on. The correspondence is modeled as a set of segmental bilinear function. These model parameters are called anchoring points. Although, the correspondence significantly affects the quality of such manipulated speech sounds as morphed and texture mapped speech sounds, anchoring points were manually aligned on time-frequency representations.
Anchoring points should be placed at auditorily important locations. When a spectrogram is presented as a time-frequency representation, auditorily important locations are given by formant frequencies around vowel transitions. The central idea of the proposed method is to prepare vowel template spectra with pre-assigned anchoring points in advance and to deform one of the templates to match the input speech spectrum. Finally, anchoring points on the input spectrum are copied from pre-assigned anchoring points.
Experimental results suggest that the naturalness of morphed speech based on the proposed automatic assignment method has equivalent quality to STRAIGHT synthetic speech samples.
Bibliographic reference. Takahashi, Toru / Nishi, Masashi / Irino, Toshio / Kawahara, Hideki (2006): "Automatic assignment of anchoring points on vowel templates for defining correspondence between time-frequency representations of speech samples", In INTERSPEECH-2006, paper 1737-Thu2BuP.10.