Copy synthesis of speech signals
Postdoctoral
position
Objective
Despite
recent progresses achieved in speech synthesis, it is still very difficult to
modify the characteristics linked to the speaker since signals are synthesized
by concatenating sounds uttered by a given speaker. It is thus almost
impossible to modify acoustic cues of sounds as well as characteristics linked
to the speaker.
The
objective of the postdoc is to elaborate copy
synthesis algorithms that enable a speech signal to be reproduced as faithfully
as possible while offering the possibility of modifying acoustic cues. For this
reason this postdoctoral work will rely on the utilization of a formant
synthesizer derived from that proposed by Klatt [1]. Synthesis
thus rests on the filtering by a system of resonators (representing formants)
of a sound source, periodic for the voiced sounds as vowels, aperiodic (a noise) for unvoiced sounds as fricatives phonemes
/f, s, ʃ, v, z, ʒ/).
Work
The work
will consist of adapting the synthesizer so that it does lend itself to copy
synthesis as well as possible and to develop algorithms to optimizing source
and formant parameters.
In order to
copy speech sufficiently finely it is necessary to adjust formant and source
parameters precisely. The LF source model proposed by Fant
and Liljencrants [2] is sufficiently versatile to
approximate a natural speech source. The optimization of the four parameters
was the subject of a number of works in the case where the vocal tract filter
and source are estimated jointly [3,4] or when the source signal is known [5]. The
specificity of copy synthesis is that the filter of the vocal tract is only
roughly approximated by formants hypothesised and that the ratio of noise in
the source has also to be adjusted for each of the formants.
Resonators
of a formant synthesizer can be organized in cascade or in parallel. Only the
second solution is usable in the case of copy synthesis because it enables
formants to be adjusted independently [6]. The frequency, amplitude and
bandwidth of each formant have to be specified. One important advantage of the
parallel architecture is that it is possible to adjust only amplitude by
setting the bandwidth to a default value once the formant frequency is known. The
second aspect of the work will be on the elaboration of an algorithm to adjust
amplitudes and frequencies. The adjustment of amplitudes must be synchronized
on source periods in order to capture fast variations of amplitude, and that of
formant frequencies will rest upon the automatic formant tracking previously
developed [7]. Improvements will be about the choice of the formant number so
as to increase the closeness of the speech copied with respect to the original
signal.
The two
aspects have been presented independently to simplify the presentation of the
work. To a certain extent only they also can be addressed independently. However,
it is clear that the improvement of the synthesis quality will be all the
better since interactions between these two aspects will have been considered
together.
The Parole
team mainly works on automatic speech recognition and speech analysis. In the
domain of analysis a number of algorithms have been developed (F0 detection,
formant tracking, pitch marking, copy synthesis...) and are available in WinSnoori software
which already contains a series of tools for copy synthesis and which is
developed by the team for several years.
Skill and
profile
A good
knowledge in speech analysis or in signal processing is required.
References
Copy
synthesis tools of WinSnoori are presented here.
References
[1] D.H. Klatt, “Software for a cascade/parallel formant
synthesizer”, J. Acoust. Soc. Amer., 67(3), p. 971-995,
March 1980.
[2] G. Fant and J. Liljencrants, “A four
parameter model of glottal flow”, STL, QPSR, 4, p. 1-13, 1985
[3] M. Frölich , D. Michaelis and H.W. Strube, “SIM-simultaneous inverse filtering and matching of
a glottal flow model for acoustic speech signals”, J. Acoust.
Soc. Amer., 115(1), p.337-351, 2003.
[4] D.
Vincent, O. Rosec and T. Chonavel,
“Estimation of LF glottal source parameters based on an ARX model”, Proc. of Interspeech, p. 333-336, Lisboa,
Sep. 2005.
[5] J. Pérez and A. Bonafonte,
“Automatic Voice-Source Parametrization of Natural
Speech”, Proc. of Interspeech, Lisboa,
Sep. 2005.
[6] W. J.
Holmes, “Copy synthesis of female speech using the JSRU parallel formant
synthesiser”, Proceedings of European Conference on Speech Technology, p.
513-516, Paris, France, Sep., 1989
[7] Y. Laprie, “A concurrent curve strategy for formant tracking”,
Proc. of ICSLP,
Contact
Interested
candidates are invited to contact Yves Laprie
(Yves.Laprie@loria.fr)
Important
information
This
position is advertised in the framework of the national INRIA campaign for
recruiting post-docs. It is a one year position, renewable, beginning fall
2007. The salary is 2,320€ gross per month.
Selection
of candidates will be a two step process. A first selection for a candidate
will be carried out internally by the PAROLE group. The selected candidate
application will then be further processed for approval and funding by an INRIA
committee.
Doctoral
thesis less than one year old (May 2006) or being defended before end of 2007. If
defence has not taken place yet, candidates must specify the tentative date and
jury for the defence.
Important - Useful
links
Presentation of INRIA postdoctoral
positions
To apply (be patient, loading this link takes times…)