Copy synthesis of speech signals

 

Postdoctoral position

 

 

Objective

Despite recent progresses achieved in speech synthesis, it is still very difficult to modify the characteristics linked to the speaker since signals are synthesized by concatenating sounds uttered by a given speaker. It is thus almost impossible to modify acoustic cues of sounds as well as characteristics linked to the speaker.

The objective of the postdoc is to elaborate copy synthesis algorithms that enable a speech signal to be reproduced as faithfully as possible while offering the possibility of modifying acoustic cues. For this reason this postdoctoral work will rely on the utilization of a formant synthesizer derived from that proposed by Klatt [1]. Synthesis thus rests on the filtering by a system of resonators (representing formants) of a sound source, periodic for the voiced sounds as vowels, aperiodic (a noise) for unvoiced sounds as fricatives phonemes /f, s, ʃ, v, z, ʒ/).

Work

The work will consist of adapting the synthesizer so that it does lend itself to copy synthesis as well as possible and to develop algorithms to optimizing source and formant parameters.

In order to copy speech sufficiently finely it is necessary to adjust formant and source parameters precisely. The LF source model proposed by Fant and Liljencrants [2] is sufficiently versatile to approximate a natural speech source. The optimization of the four parameters was the subject of a number of works in the case where the vocal tract filter and source are estimated jointly [3,4] or when the source signal is known [5]. The specificity of copy synthesis is that the filter of the vocal tract is only roughly approximated by formants hypothesised and that the ratio of noise in the source has also to be adjusted for each of the formants.

Resonators of a formant synthesizer can be organized in cascade or in parallel. Only the second solution is usable in the case of copy synthesis because it enables formants to be adjusted independently [6]. The frequency, amplitude and bandwidth of each formant have to be specified. One important advantage of the parallel architecture is that it is possible to adjust only amplitude by setting the bandwidth to a default value once the formant frequency is known. The second aspect of the work will be on the elaboration of an algorithm to adjust amplitudes and frequencies. The adjustment of amplitudes must be synchronized on source periods in order to capture fast variations of amplitude, and that of formant frequencies will rest upon the automatic formant tracking previously developed [7]. Improvements will be about the choice of the formant number so as to increase the closeness of the speech copied with respect to the original signal.

The two aspects have been presented independently to simplify the presentation of the work. To a certain extent only they also can be addressed independently. However, it is clear that the improvement of the synthesis quality will be all the better since interactions between these two aspects will have been considered together.

The Parole team mainly works on automatic speech recognition and speech analysis. In the domain of analysis a number of algorithms have been developed (F0 detection, formant tracking, pitch marking, copy synthesis...) and are available in WinSnoori software which already contains a series of tools for copy synthesis and which is developed by the team for several years.

 

Skill and profile

A good knowledge in speech analysis or in signal processing is required.

 

References

Copy synthesis tools of WinSnoori  are presented here.

References

[1] D.H. Klatt, “Software for a cascade/parallel formant synthesizer”, J. Acoust. Soc. Amer., 67(3), p. 971-995, March 1980.

[2] G. Fant and J. Liljencrants, “A four parameter model of glottal flow”, STL, QPSR, 4, p. 1-13, 1985

[3] M. Frölich , D. Michaelis and H.W. Strube, “SIM-simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals”, J. Acoust. Soc. Amer., 115(1), p.337-351, 2003.

[4] D. Vincent, O. Rosec and T. Chonavel, “Estimation of LF glottal source parameters based on an ARX model”, Proc. of Interspeech, p. 333-336, Lisboa, Sep. 2005.

[5] J. Pérez and A. Bonafonte, “Automatic Voice-Source Parametrization of Natural Speech”, Proc. of Interspeech, Lisboa, Sep. 2005.

[6] W. J. Holmes, “Copy synthesis of female speech using the JSRU parallel formant synthesiser”, Proceedings of European Conference on Speech Technology, p. 513-516, Paris, France, Sep., 1989

[7] Y. Laprie, “A concurrent curve strategy for formant tracking”, Proc. of ICSLP, Jegu, Korea, Oct. 2004

 

Contact

Interested candidates are invited to contact Yves Laprie (Yves.Laprie@loria.fr)

 

Important information

This position is advertised in the framework of the national INRIA campaign for recruiting post-docs. It is a one year position, renewable, beginning fall 2007. The salary is 2,320€ gross per month. 

 

Selection of candidates will be a two step process. A first selection for a candidate will be carried out internally by the PAROLE group. The selected candidate application will then be further processed for approval and funding by an INRIA committee.

 

Doctoral thesis less than one year old (May 2006) or being defended before end of 2007. If defence has not taken place yet, candidates must specify the tentative date and jury for the defence.

 

Important - Useful links

Presentation of INRIA postdoctoral positions

To apply (be patient, loading this link takes times…)