Contents

1 . Message from the board

 Dear members,

 

The board
 

 

 

 

Back to Top

2 . Editorial

 Dear Members, 

 You will find hereunder the May issue of ISCApad. Some of you complain about the size of the last issues. Our aim is to provide the most exhaustive information. I fully agree that this information is available on the web but in ISCApad it is gathered in a single place. We are still working to create a more compact version pointing towards information available on our web. We remain convinced that pushed information is more read. Thanks to all of you who devote some minutes to send their opinion about ISCApad: positive and negative comments are both encouraging. I am just disappointed by indifference.Chris Wellekens

Institut Eurecom

Sophia Antipolis France

Back to Top

3 . ISCA News

3-1 . ISCA Scientific Achievement Medalist 2008

ISCA Scientific Achievement Medal for 2008  It is with great pleasure that I announce the ISCA Medalist for 2008 - Hiroya Fujisaki. Prof. Fujisaki has contributed to the speech research community in so many aspects, in speech analysis, synthesis and prosody, that it will be a very hard task for me to summarize his long list of achievements. He is also the founder of the ICSLP series of conferences which, being now fully integrated as one of ISCA's yearly conferences, will have its 10th anniversary this year.


Back to Top

3-2 . ISCA Fellows

ISCA Fellows, Call for Nominations

In 2007, ISCA will begin its Fellow Program to recognize and honor  outstanding members who have made significant contributions to the field  of speech science and technology.  To qualify for this  distinction, a candidate must have been an ISCA member for five years or  more with a minimum of ten years experience in the field.  Nominations  may be made by any ISCA member (see Nomination Form).  The nomination  must be accompanied by references from three current ISCA Fellows (or, during the first three years of the program, by ISCA Board members). A Fellow may be recognized by his/her outstanding technical contributions and/or continued significant service to ISCA.  The candidate's technical contribution should be summarized in the nomination in terms of publications, patents, projects, prototypes and their impact in the community.

Fellows will be selected by a Fellow Selection Committee of nine members who each serve three-year terms.  In the first year of the program, the Committee will be formed by ISCA Board members.  Over the next three years, one third of the members of the Selection Committee will be replaced by ISCA Fellows until the Committee consists entirely of ISCA Fellows.  Members of the Committee will be chosen by the ISCA Board.
 
The committee will hold a virtual meeting during June to evaluate the current years nominations.
 
Nominations should be submitted on the form provided at http://www.isca-speech.org/fellows.html.  Nominations should be submitted before May 23rd 2008.

Back to Top

4 . SIG's activities

  • A list of Speech Interest Groups can be found on our web.

     

Back to Top

4-1 . SLaTE

The International Speech Communication Association Special Interest Group (ISCA SIG) on

Speech and Language Technology in Education

 

A special interest group was created in mid-September 2006 at the Interspeech 2006 conference in Pittsburgh. This is its official website. On this site you can find information about the SIG.

 

The next SLaTE ITRW will be in 2009 in England; here is early information about this exciting meeting!

 

OUR STATEMENT OF PURPOSE

The purpose of the International Speech Communication Association (ISCA) Special Interest Group on Speech and Language Technology in Education (SLaTE) shall be to promote interest in the use of speech and natural language processing for education; to provide members of ISCA with a special interest in speech and language technology in education with a means of exchanging news of recent research developments and other matters of interest in Speech and Language Technology in Education; to sponsor meetings and workshops on that subject that appear to be timely and worthwhile, operating within the framework of ISCA's by-laws for SIGs; and to provide and make available resources relevant to speech and language technology in education, including text and speech corpora, analysis tools, analysis and generation software, research papers and generated data.

 

Activities

  SLaTE Workshops

SLaTE ITRW Workshop October 1-3 2007 in Farmington Pennsylvania.

You can obtain proceedings of this ITRW from ISCA.

 

OTHER Workshops AND RELATED MEETINGS

 

We hark back to the first meeting of researchers interested in this area that was organized by our colleagues at KTH and held in Marholmen Sweden in 1998 http://www.speech.kth.se/still/.

 

 

Another meeting of interest in our field was held in Venice in 2004. It was organized by Rodolfo Delmonte.  http://www.isca-speech.org/archive/icall2004/index.html

 

A very interesting session was held at Interspeech 2006 by Patti Price and Abeer Alwan. The papers were reviewed by four panelists and you can see the panelists’ slides here.

Back to Top

4-2 . INVITATION « JEUNES CHERCHEURS » AUX JOURNÉES D’ÉTUDES SUR LA PAROLE 2008 (in french)

Dans le cadre de sa politique d’ouverture internationale, et en continuité de l’action lancée lors des JEPs 2004 au Maroc, et 2006 à Dinard, 
 
l’AFCP invite des étudiants ou jeunes chercheurs de la communauté Communication Parlée rattachés à des laboratoires situés hors de France 
à participer à la conférence JEP-TALN 2008 (Avignon, 9-13 juin 2008, http://www.lia.univ-avignon.fr/jep-taln08/).
 
Cette aide couvrira les frais de transport, d’hébergement et d’inscription de quelques (4/5) jeunes chercheurs venus de l’étranger.
 
Modalités de candidature :
Le candidat devra envoyer à ferrane@irit.fr ET Irina.Illina@loria.fr * AVANT LE 26 AVRIL 2008 * le dossier de candidature (voir pièce jointe) comportant :
•    un CV succinct présentant les activités scientifiques du candidat et sa formation universitaire, 
•    un paragraphe expliquant la motivation du candidat et mettant en valeur les retombées attendues d’une participation aux JEP-TALN 2008,
•    une estimation des frais de transport (voir ci-dessous).
Le dossier devra être accompagné d’une lettre de recommandation du directeur de recherche pour les étudiants
 
Remarques et Calendrier :
- Les décisions d’acceptation seront rendues pour le *5 mai 2008*
- La soumission et l’acceptation d’une contribution scientifique aux JEPs n’est pas un critère de selection pour cette invitation
- Priorité sera donnée aux candidats venant de pays peu représentés aux JEP
- Pour votre estimation de frais de transport : les aéroports les plus proches sont : Aéroport Avignon Caumont (www.avignon.aeroport.fr/), Aéroport Marseille-Provence (www.marseille.aeroport.fr) ou Aéroports de Paris (www.aeroportsdeparis.fr); la gare la plus proche est Avignon TGV ou Avignon Centre (voir www.voyages-sncf.com pour les tarifs de train).
 
Back to Top

5 . Future ISCA Conferences and workshops (ITRW)

5-1 . INTERSPEECH 2008

INTERSPEECH 2008 incorporating SST 08 

September 22-26, 2008

Brisbane Convention & Exhibition Centre

Brisbane, Australia

http://www.interspeech2008.org/

 

Interspeech is the world's largest and most comprehensive conference on Speech

Science and Speech Technology. We invite original papers in any related area,

including (but not limited to):

             Human Speech Production, Perception and Communication; 

             Speech and Language Technology; 

             Spoken Language Systems; and 

 

            Applications, Resources, Standardisation and Evaluation

  • In addition, a number of Special Sessions on selected topics have been organised and we invite you to submit for these also (see website for a complete list).

    Interspeech 2008 has two types of submission formats: Full 4-page Papers and

     Short 1-page Papers. Prospective authors are invited to submit papers in either

     format via the conference website by 7 April 2008. 

     

    Important Dates 

    Paper Submission: Monday, 7 April 2008, 3pm GMT 

    Notification of Acceptance/Rejection: Monday, 16 June 2008, 3pm GMT 

    Early Registration Deadline: Monday, 7 July 2008, 3pm GMT 

    Tutorial Day: Monday, 22 September 2008 

    Main conference: 23-26 September 2008 

     For more information please visit the website http://www.interspeech2008.org

     

    Chairman: Denis Burnham, MARCS, University of West Sydney.   

Back to Top

5-2 . INTERSPEECH 2009

Brighton, UK,
Conference Website
Chairman: Prof. Roger Moore, University of Sheffield.

Back to Top

5-3 . INTERSPEECH 2010

Chiba, Japan
Conference Website
ISCA is pleased to announce that INTERSPEECH 2010 will take place in Makuhari-Messe, Chiba, Japan, September 26-30, 2010. The event will be chaired by Keikichi Hirose (Univ. Tokyo), and will have as a theme "Towards Spoken Language Processing for All - Regardless of Age, Health Conditions, Native Languages, Environment, etc."

 

Back to Top

5-4 . ITRW on Speech analysis and processing for knowledge discovery

June 4 - 6, 2008
Aalborg, Denmark
Workshop website   http://www.es.aau.dk/ITRW/ 

 

Humans are very efficient at capturing information and messages in speech, and they often perform this task effortlessly even when the signal is degraded by noise, reverberation and channel effects. In contrast, when a speech signal is processed by conventional spectral analysis methods, significant cues and useful information in speech are usually not taken proper advantage of, resulting in sub-optimal performance in many speech systems. There exists, however, a vast literature on speech production and perception mechanisms and their impacts on acoustic phonetics that could be more effectively utilized in modern speech systems. A re-examination of these knowledge sources is needed. On the other hand, recent advances in speech modelling and processing and the availability of a huge collection of multilingual speech data have provided an unprecedented opportunity for acoustic phoneticians to revise and strengthen their knowledge and develop new theories. Such a collaborative effort between science and technology is beneficial to the speech community and it is likely to lead to a paradigm shift for designing next-generation speech algorithms and systems. This, however, calls for a focussed attention to be devoted to analysis and processing techniques aiming at a more effective extraction of information and knowledge in speech.
Objectives:
The objective of this workshop is to discuss innovative approaches to the analysis of speech signals, so that it can bring out the subtle and unique characteristics of speech and speaker. This will also help in discovering speech cues useful for improving the performance of speech systems significantly. Several attempts have been made in the past to explore speech analysis methods that can bridge the gap between human and machine processing of speech. In particular, the time varying aspects of interactions between excitation and vocal tract systems during production seem to elude exploitation. Some of the explored methods include all-pole and polezero modelling methods based on temporal weighting of the prediction errors, interpreting the zeros of speech spectra, analysis of phase in the time and transform domains, nonlinear (neural network) models for information extraction and integration, etc. Such studies may also bring out some finer details of speech signals, which may have implications in determining the acoustic-phonetic cues needed for developing robust speech systems.
The Workshop:
G will present a full-morning common tutorial to give an overview of the present stage of research linked to the subject of the workshop
G will be organised as a single series of oral and poster presentations
G each oral presentation is given 30 minutes to allow for ample time for discussion
G is an ideal forum for speech scientists to discuss the perspectives that will further future research collaborations.
Potential Topic areas:
G Parametric and nonparametric models
G New all-pole and pole-zero spectral modelling
G Temporal modelling
G Non-spectral processing (group delay etc)
G Integration of spectral and temporal processing
G Biologically-inspired speech analysis and processing
G Interactions between excitation and vocal tract systems
G Characterization and representation of acoustic phonetic attributes
G Attributed-based speaker and spoken language characterization
G Analysis and processing for detecting acoustic phonetic attributes
G Language independent aspects of acoustic phonetic attributes detection
G Detection of language-specific acoustic phonetic attributes
G Acoustic to linguistic and acoustic phonetic mapping
G Mapping from acoustic signal to articulator configurations
G Merging of synchronous and asynchronous information
G Other related topics
Registration
Fees for early and late registration for ISCA and non-ISCA members will be made available on the website during September 2007.
Venue:
The workshop will take place at Aalborg University, Department of Electronic Systems, Denmark. See the workshop website for further and latest information.
Accommodation:
There are a large number of hotels in Aalborg most of them close to the city centre. The list of hotels, their web sites and telephone numbers are given on the workshop website. Here you will also find information about transportation between the city centre and the university campus.
How to reach Aalborg:
Aalborg Airport is half an hour away from the international Copenhagen Airport. There are many daily flight connections between Copenhagen and Aalborg. Flying with Scandinavian Airlines System (SAS) or one of the Star Alliance companies to Copenhagen enables you to include Copenhagen-Aalborg into the entire ticket, and this way reducing the full transportation cost. There is also an hourly train connection between the two cities; the train ride lasts approx. five hours
Organising Committee:
Paul Dalsgaard, B. Yegnanarayana, Chin-Hui Lee, Paavo Alku, Rolf Carlson, Torbjørn Svendsen,

http://www.es.aau.dk/ITRW/
 


Back to Top

5-5 . ITRW on experimental linguistics

August 2008, Athens, Greece
Website
Prof. Antonis Botinis


Back to Top

5-6 . International Conference on Auditory-Visual Speech Processing AVSP 2008

Dates: 26-29 September 2008Location: Moreton Island, Queensland, Australia
Website: http://express.hid.ri.cmu.edu/AVSP2008/Main.html

AVSP 2008 will be held as an ISCA Tutorial and Research Workshop at
Tangalooma Wild Dolphin Resort on Moreton Island from the 26-29
September 2008. AVSP 2008 is a satellite conference to Interspeech 2008,
being held in Brisbane from the 22-26 September 2008. Tangalooma is
located at close distance from Brisbane, so that attendance at AVSP 2008
can easily be combined with participation in Interspeech 2008.

Auditory-visual speech production and perception by human and machine is
an interdisciplinary and cross-linguistic field which has attracted
speech scientists, cognitive psychologists, phoneticians, computational
engineers, and researchers in language learning studies. Since the
inaugural workshop in Bonas in 1995, Auditory-Visual Speech Processing
workshops have been organised on a regular basis (see an overview at the
avisa website). In line with previous meetings, this conference will
consist of a mixture of regular presentations (both posters and oral),
and lectures by invited speakers.

Topics include but are not limited to:
- Machine recognition
- Human and machine models of integration
- Multimodal processing of spoken events
- Cross-linguistic studies
- Developmental studies
- Gesture and expression animation
- Modelling of facial gestures
- Speech synthesis
- Prosody
- Neurophysiology and neuro-psychology of audition and vision
- Scene analysis

Paper submission:
Details of the paper submission procedure will be available on the
website in a few weeks time.

Chairs:
Simon Lucey
Roland Goecke
Patrick Lucey

 

Back to Top

5-7 . Christian Benoit workshop on Speech and Face to Face Communication

NEW Deadline for sending one page abstract = JUNE 9TH


Ten years after our colleague Christian Benoît departed, the mark that
he left is still very vivid in the international community. There will
soon be several occasions to honour his memory: during the next
Interspeech conference (Christian was secretary of the ESCA, future
ISCA, for a long time, the association is a French association of the
type described in the 1901 law and its official headquarters are still
in Grenoble), as well as during the next AVSP workshop (workshop of
which he was one of the creators). The Christian Benoît Association was
created in 1999 and regularly awards young researchers the "Christian
Benoît prize" to promote their research (the 4^th prize was awarded to
the phonetician Susanne Fuchs in 2007). The Christian Benoît association
http://www.icp.inpg.fr/ICP/_communication.fr.html#prixcb), along with
ICP, now Speech and Cognition Department of Gipsa-lab
(http://www.gipsa-lab.inpg.fr <http://www.gipsa-lab.inpg.fr/>), are
organizing a workshop/summer school to Christian Benoît’s memory, in the
line of his innovative and enthusiastic research style and aiming at
exploring the topic of "Speech and Face to Face Communication" in a
pluridisciplinary perspective: neuroscience, cognitive psychology,
phonetics, linguistics and computer modelling. The workshop "Speech and
Face to Face Communication" will be organized around 11 invited
conferences. All researchers from the field are invited to participate
through a call for papers and students will be encouraged to widely
attend the workshop and present their work.

Website: http://www.icp.inpg.fr/~dohen/face2face/

Deadline for sending one page abstracts: June 9th (see Call for Papers
<http://ww.icp.inpg.fr/%7Edohen/face2face/CallForPapers.html>)

You can subscribe to the Christian Benoît Association by sending 15
euros (active member; 45 euros or more, benefactors) to Pascal Perrier,
secretary of the association: Pascal.Perrier@gipsa-lab.inpg.fr
<mailto:Pascal.Perrier@gipsa-lab.inpg.fr>.

Back to Top

6 . Books, databases and softwares

6-1 . Books

La production de la parole
Author: Alain Marchal, Universite d'Aix en Provence, France
Publisher: Hermes Lavoisier
Year: 2007

Speech enhancement-Theory and Practice
Author: Philipos C. Loizou, University of Texas, Dallas, USA
Publisher: CRC Press
Year:2007

Speech and Language Engineering
Editor: Martin Rajman
Publisher: EPFL Press, distributed by CRC Press
Year: 2007

Human Communication Disorders/ Speech therapy
This interesting series can be listed on Wiley website

Incurses em torno do ritmo da fala
Author: Plinio A. Barbosa
Publisher: Pontes Editores (city: Campinas)
Year: 2006 (released 11/24/2006)
(In Portuguese, abstract attached.) Website

Speech Quality of VoIP: Assessment and Prediction
Author: Alexander Raake
Publisher: John Wiley & Sons, UK-Chichester, September 2006
Website

Self-Organization in the Evolution of Speech, Studies in the Evolution of Language
Author: Pierre-Yves Oudeyer
Publisher:Oxford University Press
Website

Speech Recognition Over Digital Channels
Authors: Antonio M. Peinado and Jose C. Segura
Publisher: Wiley, July 2006
Website

Multilingual Speech Processing
Editors: Tanja Schultz and Katrin Kirchhoff ,
Elsevier Academic Press, April 2006
Website

Reconnaissance automatique de la parole: Du signal a l'interpretation
Authors: Jean-Paul Haton
Christophe Cerisara
Dominique Fohr
Yves Laprie
Kamel Smaili
392 Pages     Publisher: Dunod

 

*Automatic Speech Recognition on Mobile Devices and over Communication 
Networks
*Editors: Zheng-Hua Tan and Børge Lindberg
Publisher: Springer, London, March 2008
website <http://asr.es.aau.dk/>
 
About this book
The remarkable advances in computing and networking have sparked an 
enormous interest in deploying automatic speech recognition on mobile 
devices and over communication networks. This trend is accelerating.
This book brings together leading academic researchers and industrial 
practitioners to address the issues in this emerging realm and presents 
the reader with a comprehensive introduction to the subject of speech 
recognition in devices and networks. It covers network, distributed and 
embedded speech recognition systems, which are expected to co-exist in 
the future. It offers a wide-ranging, unified approach to the topic and 
its latest development, also covering the most up-to-date standards and 
several off-the-shelf systems.
 
Latent Semantic Mapping: Principles & Applications
Author: Jerome R. Bellegarda, Apple Inc., USA
Publisher: Morgan & Claypool
Series: Synthesis Lectures on Speech and Audio Processing
Year: 2007
Website: http://www.morganclaypool.com/toc/sap/1/1
 

The Application of Hidden Markov Models in Speech Recognition
By Mark Gales and Steve Young (University of Cambridge)
http://dx.doi.org/10.1561/2000000004
 
in Foundations and Tr=nds in Signal Processing (FnTSIG)
www.nowpublishers.com/SIG 
 
 
Proceedings of the IEEE
 
Special Issue on ADVANCES IN MULTIMEDIA INFORMATION RETRIEVAL
 
Volume 96, Number 4, April 2008
 
Guest Editors:
 
Alan Hanjalic, Delft University of Technology, Netherlands
Rainer Lienhart, University of Augsburg, Germany
Wei-Ying Ma, Microsoft Research Asia, China
John R. Smith, IBM Research, USA
 
Through carefully selected, invited papers written by leading authors and research teams, the April 2008 issue of Proceedings of the IEEE (v.96, no.4) highlights successes of multimedia information retrieval research, critically analyzes the achievements made so far and assesses the applicability of multimedia information retrieval results in real-life scenarios. The issue provides insights into the current possibilities for building automated and semi-automated methods as well as algorithms for segmenting, abstracting, indexing, representing, browsing, searching and retrieving multimedia content in various contexts. Additionally, future challenges that are likely to drive the research in the multimedia information retrieval field for years to come are also discussed.
 
To learn more, please visit the corresponding IEEE Xplore site at
Back to Top

6-2 . LDC News

 
LDC2008L01
 
LDC2008T06
 
In this month's newsletter, the Linguistic Data Consortium (LDC) would like to provide information on the ACL Anthology and announce the availability of two new publications.



ACL Anthology's New Home

The ACL Anthology is a digital archive of 12,500 research papers in computational linguistics, stretching back to 1965.  All papers are available for free download.  Steven Bird established the anthology in 2001, while he was associate director at the LDC.  The initial digitization of 50,000 pages of articles was possible through the generous support of institutional and individual sponsors. For the next 6 years, the anthology was hosted on the LDC website, and it came to play a central role in the day-to-day work of computational linguists the world over.  Today, conference proceedings are added to the Anthology at the time of each conference, providing immediate free access to the latest research findings.  In 2007, the digitization of legacy materials was completed and the anthology was migrated to the website of the Association for Computational Linguistics.  Steven passed on the editorship to Min-Yen Kan.  Ongoing activities with the anthology include citation linking and extraction of raw text.  The LDC is pleased to have to have contributed to the development of the anthology and wishes the current editor continued success in providing this valuable resource.  Visit the ACL website for further information on ACL conferences, membership, and publications.

New Publications

(1) An English Dictionary of the Tamil Verb represents over twenty-five years of work led by Harold F. Schiffman, Professor, emeritus, of Dravidian Lingusitics and Culture at the University of Pennsylvania's Department of South Asia Studies. It contains translations for 6597 English verbs and defines 9716 Tamil verbs. This release presents the dictionary in two formats: Adobe PDF and XML. The PDF format displays the dictionary in a human readable form and is suitable for printing. The XML version is a purely electronic form intended mainly for application development and the creation of searchable electronic databases.

In the electronic XML version each entry contains the following: the English entry or head word; the Tamil equivalent (in Tamil script and transliteration); the verb class and transitivity specification; the spoken Tamil pronunciation (audio files in mp3 format); the English definition(s); additional Tamil entries (if applicable); example sentences or phrases in Literary Tamil, Spoken Tamil (with a corresponding audio file in .mp3 format) and an English translation; and Tamil synonyms or near-synonyms, where appropriate. It is expected that the dictionary will be useful for Tamil learners, scholars and others interested in the Tamil language.

An English Dictionary of the Tamil Verb seeks to meet needs not currently addressed by existing English-Tamil dictionaries. The main goal of this dictionary is to get an English-knowing user to a Tamil verb, irrespective of whether he or she begins with an English verb or some other item, such as an adjective; this is because what may be a verb in Tamil may in fact not be a verb in English, and vice versa. Since the number of English entries is limited (slightly less than 10,000) there may not be main entries for certain low-frequency items like 'pounce' but this item does appear as a synonym for 'jump, leap', and some other verbs, so searching for 'pounce' will get the user to a Tamil verb via the synonym field. The main goal is therefore to specifically concentrate on supplying the kinds of information lacking in all previous attempts to capture the equivalencies between English and Tamil. An English Dictionary of the Tamil Verb is distributed on one DVD-ROM.

2008 Subscription Members will automatically receive two copies of this corpus. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$300.

*

(2) GALE Phase 1 Chinese Blog Parallel Text was prepared by the LDC and consists of 313K characters (277 files) of Chinese blog text and its translation selected from eight sources. This release was used as training data in Phase 1 of the DARPA-funded GALE program.

The task of preparing this corpus involved four stages of work: data scouting, data harvesting, formatting, and data selection.

Data scouting involved manually searching the web for suitable blog text. Data scouts were assigned particular topics and genres along with a production target in order to focus their web search. Formal annotation guidelines and a customized annotation toolkit helped data scouts to manage the search process and to track progress.

Data scouts logged their decisions about potential text of interest (sites, threads and posts) to a database. A nightly process queried the annotation database and harvested all designated URLs. Whenever possible, the entire site was downloaded, not just the individual thread or post located by the data scout.

Once the text was downloaded, its format was standardized so that the data could be more easily integrated into downstream annotation processes. Typically a new script was required for each new domain name that was identified. After scripts were run, an optional manual process corrected any remaining formatting problems.

The selected documents were then reviewed for content suitability using a semi-automatic process. A statistical approach was used to rank a document's relevance to a set of already-selected documents labeled as "good." An annotator then reviewed the list of relevance-ranked documents and selected those which were suitable for a particular annotation task or for annotation in general.

Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Three types of end of sentence SU were identified: statement SU, question SU, and incomplete SU.

After files were selected, they were reformatted into a human-readable translation format, and the files were then assigned to professional translators for careful translation. Translators followed LDC's GALE Translation guidelines, which describe the makeup of the translation team, the source, data format, the translation data format, best practices for translating certain linguistic features (such as names and speech disfluencies), and quality control procedures applied to completed translations.  GALE Phase 1 Chinese Blog Parallel Text is distributed via web download.

2008 Subscription Members will automatically receive two copies of this corpus on disc. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1500.


 
Ilya Ahtaridis
Membership Coordinator

--------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 573-1275
University of Pennsylvania Fax: (215) 573-2175 3600 Market St., Suite 810
Philadelphia, PA 19104 USA http://www.ldc.upenn.edu

Back to Top

6-3 . Question Answering on speech transcripts (QAst)

  • The QAst organizers are pleased to announce the release of the development dataset for
    the CLEF-QA 2008 track "Question Answering on Speech Transcripts" (QAst).
    We take this opportunity to launch a first call for participation in
    this evaluation exercise.

    QAst is a CLEF-QA track that aims at providing an evaluation framework
    for QA technology on speech transcripts, both manual and automatic.
    A detailed description of this track is available at:
    http://www.lsi.upc.edu/~qast <http://www.lsi.upc.edu/~qast>

    It is the second evaluation for the QAst track.
    Last year (QAst 2007), factual questions had been generated for two
    distinct corpora (in English language only). This year, in addition to
    factual questions,
    some definition questions are generated, and five corpora covering three
    different languages are used (3 corpora in English, 1 in Spanish and 1
    in French).

    Important dates:

    # 15 June 2008: evaluation set released
    # 30 June 2008: submission deadline

    The pilot track is organized jointly by the Technical University of
    Catalonia (UPC), the Evaluations and Language resources Distribution
    Agency (ELDA) and Laboratoire d'Informatique pour la Mécanique et les
    Sciences de l'Ingénieur (LIMSI).

    If you are interested in participating please send an email to Jordi
    Turmo (turmo_AT_lsi.upc.edu) with "QAst" in the subject line.


Back to Top

6-4 . ELRA - Language Resources Catalogue - Update

 
ELRA is happy to announce that 1 new Speech Resource, produced within
the Technolangue programme, is now available in its catalogue.
 
*ELRA-S0272 MEDIA speech database for French
*The MEDIA speech database for French was produced by ELDA within the
French national project MEDIA (Automatic evaluation of man-machine
dialogue systems), as part of the Technolangue programme funded by the
French Ministry of Research and New Technologies (MRNT). It contains
1,258 transcribed dialogues from 250 adult speakers. The method chosen
for the corpus construction process is that of a =91Wizard of Oz=92 (WoZ)
 
system. This consists of simulating a natural language man-machine
dialogue. The scenario was built in the domain of tourism and hotel
reservation.
The semantic annotation of the corpus is available in this catalogue and
referenced ELRA-E0024 (MEDIA Evaluation Package).
For more information, see:=20
http://catalog.elra.info/product_info.php?products_id=3D1057
 
For more information on the catalogue, please contact Val=E9rie Mapelli
mailto:mapelli@elda.org
 
Visit our on-line catalogue: http://catalog.elra.info
<http://catalog.elra.info/>.
 
Back to Top

7 . Job openings

Back to Top

7-1 . AT&T - Labs Research: Research Staff Positions - Florham Park, NJ

AT&T - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are                 

  • Large-vocabulary automatic speech recognition
  • Acoustic and language modeling
  • Robust speech recognition
  • Signal processing
  • Speaker recognition
  • Speech data mining
  • Natural language understanding and dialog
  • Text and web mining
  • Voice and multimodal search

AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).

 


Back to Top

7-2 . Summer Inter positions at Motorola Schaumburg Illinois USA

Motorola Labs - Center for Human Interaction Research (CHIR) located in Schaumburg Illinois, USA, is offering summer intern positions in 2008 (12 weeks each).

CHIR's mission:

Our research lab develops technologies that provide access to rich communication, media and information services effortless, based on natural, intelligent interaction. Our research aims on systems that adapt automatically and proactively to changing environments, device capabilities and to continually evolving knowledge about the user.

Intern profiles:

1) Acoustic environment/event detection and classification.

Successful candidate will be a PhD student near the end of his/her PhD study and is skilled in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. Candidates with knowledge of acoustic environment/event classification are preferred.

2) Speaker adaptation for applications on speech recognition and spoken document retrieval.

The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete understanding and hand-on experience on automatic speech recognition related research. Proficiency in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker adaptation is highly preferred.

3) Development of voice search-based web applications on a smartphone

We are looking for an intern candidate to help create an "experience" prototype based on our voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and rich interaction with web resources. This intern project is oriented more towards software engineering than research. We target an intern with a master's degree and strong software engineering background. Mastery of C++ and experience with web programming (AJAX and web services) is required. Development experience on Windows CE/Mobile desired.

4) Integrated Voice Search Technology For Mobile Devices.

Candidate should be proficient in information retrieval, pattern recognition and speech recognition. Candidate should program in C++ and script languages such as Python or Perl in Linux environment. Also, he/she should have knowledge on information retrieval or search engines.

We offer competitive compensation, fun-to-work environment and Chicago-style pizza.

If you are interested, please send your resume to:

Dusan Macho, CHIR-Motorola Labs

Email:  dusan.macho@motorola.com

Tel: +1-847-576-6762

 


Back to Top

7-3 . Nuance: Software engineer speech dialog tools

In order to strengthen our Embedded ASR Research team, we are looking for a:

SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

OVERVIEW:

- You will work in Nuance's Embedded ASR R&D team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.

- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.

- You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.

- In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.

QUALIFICATIONS:

- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.

- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

- You have experience with Python or other scripting languages.

- GUI programming experience is a strong asset.

The following skills are a plus:

- Understanding of communication protocols

- Understanding of databases

- Understanding of computational agents and related frameworks (such as OAA).

- A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics and machine learning, especially as related to natural language processing, dialogue, and representation of information

- You can work both as a team player and as goal-oriented independent software engineer.

- You can work in a multi-national team and communicate effectively with people of different cultures.

- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

- You are fluent in English and you can write high quality documentation.

- Knowledge of other languages is a plus.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

Deanna Roe                  Deanna.roe@nuance.com

Please make sure to document to us your excellent software engineering skills.

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

7-4 . Nuance: Speech scientist London UK

  • Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

    To strengthen our International Professional Services team, based in London, we are currently looking for a

     

     

                                Speech Scientist, London, UK

    Nuance Professional Services (PS) has designed, developed, and optimized thousands of speech systems across dozens of industries, including directory search, call center automation, applications in telecom, finance, airline, healthcare, and other verticals; applications for video games, mobile dictation, enhanced search services, SMS, and in-car navigation.  Nuance PS applications have automated approximately 7 billion phone conversations for some of the world's most respected companies, including British Airways, Vodafone, Amtrak, Bank of America, BellCanada, Citigroup, General Electric, NTT and Verizon.

    The PS organization consists of energetic, motivated, and friendly individuals.  The Speech Scientists in PS are among the best and brightest, with PhDs from universities such as Cambridge (UK), MIT, McGill, Harvard, Penn, CMU, and Georgia Tech, and having worked at research labs such Bell Labs, Motorola Labs, and ATR (Japan), culminating in over 300 years of Speech Science experience and covering well over 20 languages.

    Come and join Nuance PS and work on the latest technology from one of the prominent speech recognition technology providers, and make a difference in the way the world communicates.

    Job Overview

    As a Speech Scientist in the Professional Services group, you will work on automated speech recognition applications, covering a broad range of activities in all project phases, including the design, development, and optimization of the system.  You will:

    • Work across application development teams to ensure best possible recognition performance in deployed systems
    • Identify recognition challenges and assess accuracy feasibility during the design phase,
    • Design, develop, and test VoiceXML grammars and create JSPs, Java, and ECMAscript grammars for dynamic contexts
    • Optimize accuracy of applications by analyzing performance and tuning statistical language models, pronunciations, and acoustic models, including identifying areas for improvement by running the recognizer offline
    • Contribute to the generation and presentation of client-facing reports
    • Act as technical lead on more intensive client projects
    • Develop methodologies, scripts, procedures that improve efficiency and quality
    • Develop tools and enhance algorithms that facilitate deployment and tuning of recognition components
    • Act as subject matter domain expert for specific knowledge domains
    • Provide input into the design of future product releases

         Required Skills

    • MS or PhD in Computer Science, Engineering, Computational Linguistics, Physics, Mathematics, or related field (or equivalent)
    • Strong analytical and problem solving skills and ability to troubleshoot issues
    • Good judgment and quick-thinking
    • Strong programming skills, preferably Perl or Python
    • Excellent written and verbal communications skills
    • Ability to scope work taking technical, business and time-frame constraints into consideration
    • Works well in a team and in a fast-paced environment

    Beneficial Skills

    • Strong programming skills in either Perl, Python, Java, C/C++, or Matlab
    • Speech recognition knowledge
    • Strong pattern recognition, linguistics, signal processing, or acoustics knowledge
    • Statistical data analysis
    • Experience with XML, VoiceXML, and Wiki
    • Ability to mentor or supervise others
    • Additional language skills, eg French, Dutch, German, Spanish

     


Back to Top

7-5 . Nuance: Research engineer speech engine

In order to strengthen our Embedded ASR Research team, we are looking for a:

 RESEARCH ENGINEER SPEECH ENGINE

As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

 OVERVIEW:

- You will work in Nuance's Embedded ASR R&D team, developing, improving and maintaining core ASR engine algorithms for our customers in the Automotive and Mobile sector.

- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the developing, improving and maintaining core ASR engine algorithms for cutting edge speech and natural language understanding technologies for automotive and mobile devices.

- You will work on the design and development of more efficient, flexible ASR search algorithms with high focus on low memory and processor requirements.

QUALIFICATIONS:

- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field. PhD is a plus.

- A background in (computational) linguistics, speech processing, ASR search, confidence values, grammars, statistics and machine learning, especially as related to natural language processing.

- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

The following skills are a plus:

- You have experience with Python or other scripting languages.

- Broad knowledge about architectures of embedded platforms and processors.

- Understanding of databases

- You can work both as a team player and as goal-oriented independent software engineer.

- You can work in a multi-national team and communicate effectively with people of different cultures.

- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

- You are fluent in English and you can write high quality documentation.

- Knowledge of other languages is a plus.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

Deanna Roe                  Deanna.roe@nuance.com

Please make sure to document to us your excellent software engineering skills.

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

7-6 . Nuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:

In order to strengthen our Embedded ASR Research team, we are looking for a:

    RESEARCH ENGINEER SPEECH DIALOGUE SYSTEMS

As part of our team, you will be creating speech technologies for embedded applications varying from simple command and control tasks up to natural language speech dialogues on mobile and automotive platforms.

OVERVIEW:

-You will work in Nuance's Embedded ASR research and production team, creating technology, tools and runtime software to enable our customers develop embedded speech applications. In our team of speech and language experts, you will work on natural language dialogue systems that define the state of the art.

- You will work at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the picturesque city of Ghent, in the Flanders region of Belgium. Ghent has one of the most spectacular historic town centers of Europe and is known for its unique vibrant yet cozy charm, and is home to a large international university.

- You will work in an international company and cooperate with people on various locations including in Europe, America, and Asia.  You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the development of cutting edge natural language dialogue and speech recognition technologies for automotive embedded systems and mobile devices.

- You will design, implement, evaluate, optimize, and test new algorithms and tools for our speech recognition systems, both for research prototypes and deployed products, including all aspects of dialogue systems design, such as architecture, natural language understanding, dialogue modeling, statistical framework, and so forth.

- You will help the engine process multi-lingual natural and spontaneous speech in various noise conditions, given the challenging memory and processing power constraints of the embedded world.

QUALIFICATIONS:

- You have a university degree in computer science, (computational) linguistics, engineering, mathematics, physics, or a related field. A graduate degree is an asset.

-You have strong software and programming skills, especially in C/C++, ideally for embedded applications. Knowledge of Python or other scripting languages is a plus. [HQ1] 

- You have experience in one or more of the following fields:

     dialogue systems

     applied (computational) linguistics

     natural language understanding

     language generation

     search engines

     speech recognition

     grammars and parsing techniques.

     statistics and machine learning techniques

     XML processing

-You are a team player, willing to take initiative and assume responsibility for your tasks, and are goal-oriented.

-You can work in a multi-national team and communicate effectively with people of different cultures.

-You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

-You are fluent in English and you can write high quality documentation.

-Knowledge of other languages is a strong asset.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

 

Deanna Roe                  Deanna.roe@nuance.com

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

7-7 . Research Position in Speech Processing at Nagoya Institute of Technology,Japan

Nagoya Institute of Technology is seeking a researcher for a

post-doctoral position in a new European Commission-funded project

EMIME ("Efficient multilingual interaction in mobile environment")

involving Nagoya Institute of Technology and other five European

partners, starting in March 2008 (see the project summary below).

The earliest starting date of the position is March 2007. The initial

duration of the contract will be one year, with a possibility for

prolongation (year-by-year basis, maximum of three years). The

position provides opportunities to collaborate with other researchers

in a variety of national and international projects. The competitive

salary is calculated according to qualifications based on NIT scales.

The candidate should have a strong background in speech signal

processing and some experience with speech synthesis and recognition.

Desired skills include familiarity with latest spectrum of technology

including HTK, HTS, and Festival at the source code level.

For more information, please contact Keiichi Tokuda

(http://www.sp.nitech.ac.jp/~tokuda/).

 

About us

Nagoya Institute of Technology (NIT), founded on 1905, is situated in

the world-quality manufacturing area of Central Japan (about one hour

and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).

NIT is a highest-level educational institution of technology and is

one of the leaders of such institutions in Japan. EMIME will be

carried at the Speech Processing Laboratory (SPL) in the Department of

Computer Science and Engineering of NIT. SPL is known for its

outstanding, continuous contribution of developing high-performance,

high-quality opensource software: the HMM-based Speech Synthesis

System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary

continuous speech recognition engine "Julius"

(http://julius.sourceforge.jp/), and the Speech Signal Processing

Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is

involved in numerous national and international collaborative

projects. SPL also has close partnerships with many industrial

companies, in order to transfer its research into commercial

applications, including Toyota, Nissan, Panasonic, Brother Inc.,

Funai, Asahi-Kasei, ATR.

Project summary of EMIME

The EMIME project will help to overcome the language barrier by

developing a mobile device that performs personalized speech-to-speech

translation, such that a user's spoken input in one language is used

to produce spoken output in another language, while continuing to

sound like the user's voice. Personalization of systems for

cross-lingual spoken communication is an important, but little

explored, topic. It is essential for providing more natural

interaction and making the computing device a less obtrusive element

when assisting human-human interactions.

We will build on recent developments in speech synthesis using hidden

Markov models, which is the same technology used for automatic speech

recognition. Using a common statistical modeling framework for

automatic speech recognition and speech synthesis will enable the use

of common techniques for adaptation and multilinguality.

Significant progress will be made towards a unified approach for

speech recognition and speech synthesis: this is a very powerful

concept, and will open up many new areas of research. In this

project, we will explore the use of speaker adaptation across

languages so that, by performing automatic speech recognition, we can

learn the characteristics of an individual speaker, and then use those

characteristics when producing output speech in another language.

Our objectives are to:

1. Personalize speech processing systems by learning individual

characteristics of a user's speech and reproducing them in

synthesized speech.

2. Introduce a cross-lingual capability such that personal

characteristics can be reproduced in a second language not spoken

by the user.

3. Develop and better understand the mathematical and theoretical

relationship between speech recognition and synthesis.

4. Eliminate the need for human intervention in the process of

cross-lingual personalization.

5. Evaluate our research against state-of-the art techniques and in a

practical mobile application.

 


Back to Top

7-8 . C/C++ Programmer Munich, Germany

Digital publishing AG is one of Europe's leading producers of  interactive software for foreign language training. In our e- learning courses we want to place the emphasis on speaking and  spoken language understanding.  In order to strengthen our Research & Development Team in Munich,  Germany, we are looking for experienced C or C++ programmers with  at least 3 years experience in the design and coding of  sophisticated software systems under Windows.   
We offer   
-a creative working atmosphere in an international team of   software engineers, linguists and editors working on    challenging research projects in speech recognition and    speech dialogue systems  
- participation in all phases of a product life cycle, as we    are interested in the fast transfer of research results    into products.  
- the possibility to participate in international scientific    conferences.   
- a permanent job in the center of Munich.  
- excellent possibilities for development within our fast    growing company.    
- flexible working times, competitive compensation and    arguably the best espresso in Munich.   
We expect  
-several years of practical experience in software    development in C or C++ in a commercial or academic    environment.  
-experience with parallel algorithms and thread    programming.  
-experience with object-oriented design of software    systems.  
-good knowledge of English or German.   
Desirable is  
-experience with optimization of algorithms.  
-experience in statistical speech or language    processing, preferably speech recognition, speech    synthesis, speech dialogue systems or chatbots.  
-experience with Delphi or Turbo Pascal.   
Interested? We look forward to your application:  (preferably by e-mail)   
digital publishing AG  
Freddy Ertl  f.ertl@digitalpublishing.de  
Tumblinger Straße 32  
D-80337 München Germany 

 

Back to Top

7-9 . Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
  • You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
  • You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.

 

Requirements:

  • Solid background in speech recognition, natural language processing, machine learning and information extraction.
  • 2+ years of experience participating in software development projects
  • Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
  • Excellent analytical and problem-solving skills
  • Integrate and communicate well in small R&D teams
  • Masters degree in CS or related engineering fields
  • Experience in a healthcare-related field a plus

 

In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

7-10 . Senior Research Scientist -- Speech and Natural Language Processing at M*Modal, Pittsburgh, PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • Plan and perform research and development tasks to continuously improve a state-of-the-art speech understanding system
  • Take a leading role in identifying solutions to challenging technical problems
  • Contribute original ideas and turn them into product-grade software implementations
  • Collaborate with other members of the R&D team to identify, analyze and resolve technical issues

 

Requirements:

  • Solid research & development background with 3+ years of experience in speech recognition research, covering at least two of the following topics: speech processing, acoustic modeling, language modeling, decoding, LVCSR, natural language processing/understanding, speaker verification/identification, audio mining
  • Working knowledge of Machine Learning, Information Extraction and Natural Language Processing algorithms
  • 3+ years of experience participating in large-scale software development projects using C++ and Java.
  • Excellent analytical, problem-solving and communication skills
  • PhD with focus on speech recognition or Masters degree with 3+ years industry experience working on automatic speech recognition
  • Experience and/or education in medical informatics a plus
  • Working experience in a healthcare related field a plus

 


In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

7-11 . Postdoc position at LORIA, Nancy, France

Building an articulatory model from ultrasound, EMA and MRI data

 

Postdoctoral position

 

 

Research project

An articulatory model comprises both the visible and the internal mobile articulators which are involved in speech articulation: the lower jaw, tongue, lips and velum) as well as the fixed walls (the palate, the rear wall of the pharynx). An articulatory model is dynamic since the articulators deform during speech production. Such a model has a potential interest in the field of language learning by providing visual feedback on the articulation conducted by the learner, and many other applications.

Building an articulatory model is difficult because the different articulators have to be detected from specific image modalities: the lips are acquired through video, the tongue shape is acquired through ultrasound imaging with a high frame rate but these 2D images are very noisy. Finally, 3D images of all articulators can be obtained with MRI but only for sustained sounds (as vowels) due to the long acquisition time of MRI images.

The subject of this post-doc is to construct a dynamic 3D model of the entire vocal tract by merging the 3D information available in the MRI acquisitions and temporal 2D information provided by the contours of the tongue visible on the ultrasound images or X-ray images.

We are working on the construction of an articulatory model within the European project ASPI (http://aspi.loria.fr/ ).

We already built an acquisition system which allows us to obtain synchronized data from ultrasound, MRI, video and EM modalities.

Only a few complete articulatory models are currently available in the world and a real challenge in the field is to design set-ups and easy-to-use methods for automatically building the model of any speaker from 3D and 2D images. Indeed, the existence of more articulatory models would open new directions of research about speaker variability and speech production.

 

Objectives

The aim of the subject is to build a deformable model of the vocal tract from static 3D MRI images and 2D dynamic 2D sequences. Previous works have been conducted on the modelling of the vocal tract, and especially of the tongue (M. Stone[1] O. Engwall[2]). Unfortunately, important human interaction is required to extract tongue contours in the images. In addition, only one image modality is often considered in these works, thus reducing the reliability of the model obtained.

The aim of this work is to provide automatic methods for segmenting features in the images as well as methods for building a parametric model of the 3D vocal tract with these specific aims:

  • The segmentation process is to be guided by prior knowledge on the vocal tract. In particular shape, topologic as well as regularity constraints must be considered.
  • A parametric model of the vocal tract has to be defined (classical models are linear and built from a principal component analysis). Special emphasis must be put on the problem of matching the various features between the images.
  • Besides classical geometric constraints, both the building and the assessment of the model will be guided by acoustic distances in order to check for the adequation between the sound synthesized from the model and the sound realized by the human speaker.

 

Skill and profile

The recruited person must have a solid background in computer vision and in applied mathematics. Informations and demonstrations on the research topics addressed by the Magrit team are available at http://magrit.loria.fr/  

 

References

[1] M. Stone : Modeling tongue surface contours from Cine-MRI images. Journal of Speech, language, hearing research, 2001.

[2]:P. Badin, G. Bailly, L. Reveret: Three-dimensional linear articulatory modeling of tongue, lips and face based on MRI and video images, Journal of Phonetics, 2002, vol 30, p 533-553

 

Contact

Interested candidates are invited to contact Marie-Odile Berger, berger@loria.fr, +33 3 54 95 85 01

 

Important information

This position is advertised in the framework of the national INRIA campaign for recruiting post-docs. It is a one year position, renewable, beginning fall 2008. The salary is 2,320€ gross per month. 

 

Selection of candidates will be a two step process. A first selection for a candidate will be carried out internally by the Magrit group. The selected candidate application will then be further processed for approval and funding by an INRIA committee.

 

Doctoral thesis less than one year old (May 2007) or being defended before end of 2008. If defence has not taken place yet, candidates must specify the tentative date and jury for the defence.

 

Important - Useful links

Presentation of INRIA postdoctoral positions

To apply (be patient, loading this link takes times...)

 

Back to Top

7-12 . Internships at Motorola Labs Schaumburg

Motorola Labs - Center for Human Interaction Research (CHIR) 
located in Schaumburg Illinois, USA, 
is offering summer intern positions in 2008 (12 weeks each). 
 
CHIR's mission
 
Our research lab develops technologies that provide access to rich communication, media and 
information services effortless, based on natural, intelligent interaction. Our research 
aims on systems that adapt automatically and proactively to changing environments, device 
capabilities and to continually evolving knowledge about the user.
 
Intern profiles
 
1) Acoustic environment/event detection and classification. 
Successful candidate will be a PhD student near the end of his/her PhD study and is skilled 
in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. 
Candidates with knowledge of acoustic environment/event classification are preferred. 
 
2) Speaker adaptation for applications on speech recognition and spoken document retrieval
The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete 
understanding and hand-on experience on automatic speech recognition related research. Proficiency 
in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker 
adaptation is highly preferred.
 
3) Development of voice search-based web applications on a smartphone 
We are looking for an intern candidate to help create an "experience" prototype based on our 
voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and 
rich interaction with web resources. This intern project is oriented more towards software engineering 
than research. We target an intern with a master's degree and strong software engineering background. 
Mastery of C++ and experience with web programming (AJAX and web services) is required. 
Development experience on Windows CE/Mobile desired.
 
4) Integrated Voice Search Technology For Mobile Devices
Candidate should be proficient in information retrieval, pattern recognition and speech recognition. 
Candidate should program in C++ and script languages such as Python or Perl in Linux environment. 
Also, he/she should have knowledge on information retrieval or search engines.
 
We offer competitive compensation, fun-to-work environment and Chicago-style pizza.
 
If you are interested, please send your resume to:
 
Dusan Macho, CHIR-Motorola Labs
Email: dusan [dot] macho [at] motorola [dot] com
Tel: +1-847-576-6762

 

Back to Top

7-13 . Masters in Human Language Technology

*** Studentships available for 2008/9 *** 

                   One-Year Masters Course in  HUMAN LANGUAGE TECHNOLOGY 
                                         Department of Computer Science                
                                           The University of Sheffield - UK  
The Sheffield MSc in Human Language Technology (HLT) has been carefully tailored 
to meet the demand for graduates with the highly-specialised multi-disciplinary skills 
that are required in HLT, both as practitioners in the development of HLT applications 
and as researchers into the advanced capabilities required for next-generation HLT 
systems.  The course provides a balanced programme of instruction across a range 
of relevant disciplines including speech technology, natural language processing and 
dialogue systems.  The programme is taught in a research-led environment.  
This means that you will study the most advanced theories and techniques in the field, 
and have the opportunity to use state-of-the-art software tools.  You will also have 
opportunities to engage in research-level activity through in-depth exploration of 
chosen topics and through your dissertation.  As well as readying yourself for 
employment in the HLT industry, this course is also an excellent introduction to the 
substantial research opportunities for doctoral-level study in HLT.  
***  A number of studentships are available, on a competitive basis, to suitably 
qualified applicants.  These awards pay a stipend in addition to the course fees.  
***  For further details of the course, 
For information on how to apply 


Back to Top

7-14 . PhD positions at Supelec, Paris

 

 

Training Generative Bayesian Networks with Missing Data

 
Learning generative model parameters with missing data : application to user modelling for spoken dialogue systems optimization.
  

Description : 

Probabilistic models such as Bayesian Networks (BN) are widely used for reasoning under uncertainty about many domains. A BN is a graphical model that captures statistical properties of a data set in a parametric and compact representation. This representation can then be used to realise probabilistic inference about the domain from which the data were drawn. As any Bayesian method, Bayesian networks allow taking a priori knowledge into account so as to enhance the performance of the model or speed up the parameters learning process. They are part of a wider class of models called generative models because they also allow generating new data having similar statistical properties as those used for training the model. The purpose of this thesis is to develop new training algorithms so as to learn BN parameters from incomplete datasets ; that is datasets were some data are missing. Since the resulting models will be used to expand the training data set with statistically consistent samples, this may influence the parameters learning process.

 Application :

This thesis is proposed in the framework of a European project (CLASSiC) aiming at automatically optimising human-machine spoken interactions. Current learning methods applied to such a task require a large amount of spoken dialogue data that is not easy to gather and, above all, to annotate. It’s even more difficult if the spoken dialogue system is still in the design process. A widely adopted solution is to expand the existing datasets using probabilistic generative models that produce new samples of dialogues. Yet, the training sets are most often annotated from recorded or transcribed dialogues without additional information coming from the users. Their actual goal when using the system is often missing and difficult to infer from transcriptions. Moreover, none of the current solutions have proven to generate realistic dialogues in term of goal consistency for instance. Training models considering the users goal as missing so as to generate realistic dialogues will be the objective.

 Context :

The PhD student will participate to a European project (CLASSiC) funded by the FP7 ICT program of the European Commission. The CLASSiC consortium includes Supélec (French engineering school), the universities of Edinburgh, Cambridge and Geneva as well as France Télécom (French telecom operator). The selected candidate will be hosted on the Metz campus of Supélec and will join the IMS research group.

 Profile

The candidate should hold a Master or Engineering degree in computer science or signal processing, with knowledge in machine learning and good skills in C++ programming. English speaking is required ; French speaking would be a plus.

 Contact : Olivier Pietquin (olivier.pietquin@supelec.fr)

 

 

Bayesian Methods for Generalization in Reinforcement Learning

   
Bayesian methods for generalization and direct policy search in reinforcement learning : application to spoken dialogue systems optimization.
  

Description : 

Reinforcement Learning (RL) is an on-line machine learning paradigm that aims at finding optimal policies to control complex stochastic dynamical systems. RL is typically a good candidate to replace heuristically-driven control policies because of its ability to learn continuously from experiences so as to maximize a utility function. It has proven its efficiency at finding optimal control policies in the case of discrete systems (discrete state and action spaces, as well as discrete time). Yet, most of real-world problems are continuous or hybrid in states and actions or their state space is big enough to be approximated by a continuous space. Designing realistic reinforcement learning algorithms for handling such problems is still research. Policy generalization by means of supervised learning is promising. Yet the optimal policy, or any related function, cannot be known accurately while learning and standard off line regression is therefore not suitable since new information is gathered while interacting with the system. So a critical issue is to build a generalization method, suitable for policy evaluation, able to update its parameters on-line from uncertain observations. In addition, uncertainty should be managed carefully, and thus estimated all along the learning process, so as to avoid generating hazardous policies while exploring optimally the policy space. Bayesian filtering is proposed as a possible framework to tackle this problem because of its inherent adequacy to learning under uncertainty. Particularly, it is proposed to make use of Bayesian filters to search directly in the policy space.

 Application :

This thesis is proposed in the framework of a European project (CLASSiC) aiming at automatically optimising human-machine spoken interactions. Current learning methods applied to such a task require a large amount of spoken dialogue data that is not easy to gather and, above all, to annotate. It’s even more difficult if the spoken dialogue system is still in the design process. Generalizing policies to handle interactions that are not cannot be found in collected database is therefore necessary. In addition, call centres are used by millions of persons every year. New information should therefore be available after the system has been released and should be used to enhance its performance. This is why on-line learning is crucial.

 Context

The PhD student will participate to a European project (CLASSiC) funded by the FP7 ICT program of the European Commission. The CLASSiC consortium includes Supélec (French engineering school), the universities of Edinburgh, Cambridge and Geneva as well as France Télécom (French telecom operator). The selected candidate will be hosted on the Metz campus of Supélec and will join the IMS research group.

 Profile

The candidate should hold a Master or Engineering degree in computer science or signal processing, with knowledge in machine learning and good skills in C++ programming. English speaking is required ; French speaking would be a plus.

 Contacts : 

Hervé Frezza-Buet (herve.frezza-buet@supelec.fr)  

 

 

 

 

 

 

 

Back to Top

7-15 . Speech Faculty Position at CMU, Pittsburgh, Pensylvania

Carnegie Mellon University: Language Technologies Institute
Speech Faculty Position

The Language Technologies Institute (LTI), a department in the School of Computer Science at Carnegie Mellon University invites applications for a tenure or research track faculty position, starting on or around August, 2008. We are particularly interested in candidates at the Assistant Professor level for tenure track or research track, and specializing in the area of Speech Recognition. Applicants should have a Ph.D. in Computer Science, or a closely related subject.

Preference will be given to applicants with a strong focus on new aspects of speech recognition such as finite state models, active learning, discriminative training, adaptation techniques.

The LTI offers two existing speech recognition engines, JANUS and SPHINX, which are integrated into a wide range of speech applications including speech-to-speech translation and spoken dialog systems.

The LTI is the largest department of its kind with more than 20 faculty and 100 graduate students covering all areas of language technologies, including speech, translation, natural language processing, information retrieval, text mining, dialog, and aspects of computational biology. The LTI is part of Carnegie Mellon's School of Computer Science, which has hundreds of faculty and students in a wide variety of areas, from theoretical computer science and machine learning to robotics, language technologies, and human-computerinteraction.

Please follow the instructions for faculty applications to the School of Computer Science, explicitly mentioning LTI, at: http://www.cs.cmu.edu/~scsdean/FacultyPage/scshiringad08.html, and also notify the head of the LTI search committee by email, Alan W Black (awb@cs.cmu.edu) or Tanja Schultz (tanja@cs.cmu.edu) so that we will be looking for your application.  Electronic submissions are greatly preferred but if you wish to apply on paper, please send two copies of your application materials, to the School of Computer Science
               
1. Language Technologies Faculty Search Committee
   School of Computer Science
   Carnegie Mellon University
   5000 Forbes Avenue
   Pittsburgh, PA 15213-3891

               
Each application should include curriculum vitae, statement of research and teaching interests, copies of 1-3 representative papers, and the names and email addresses of three or more individuals who you have asked to provide letters of reference. Applicants should arrange for reference letters to be sent directly to the Faculty Search Committee (hard copy or email), to arrive before March 31, 2008. Letters will not be requested directly by the Search Committee. All applications should indicate citizenship and, in the case of non-US citizens, describe current visa status.
               
Applications and reference letters may be submitted via email (word or .pdf format) to lti-faculty-search@cs.cmu.edu
 


Back to Top

7-16 . Opened positions at Microsoft: Danish Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Danish language, for a 3 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Danish speaker

·         Have a university degree in Linguistics or related field (preferably in Danish Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start immediately (April 1, 2008)

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions. 

Back to Top

7-17 . Opened positions at Microsoft: Swedish Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Swedish language, for a 1 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Swedish speaker

·         Have a university degree in Linguistics or related field (preferably in Swedish Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start in May 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions. 

Back to Top

7-18 . Opened positions at Microsoft: Dutch Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Dutch language, for a 1 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Dutch speaker

·         Have a university degree in Linguistics or related field (preferably in Dutch Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start in May 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions. 

Back to Top

7-19 . PhD position at Orange Lab

* Position : PhD, 3 years
* Research Area : speech synthesis, prosody modelling
* Location : Orange Labs, Lannion, France
* Start date: Openings Immediate.
 
* Summary:=20
The emergence of corpus-based technologies allowed major improvements in 
Text-to-Speech (TTS) during the last decade. Such systems can produce 
very natural synthetic sentences, almost undistinguishable from natural 
speech. Synthetic prompts can now replace human recordings in some 
commercial applications, like IVR services. However their use remains 
delicate due to the lack of prosody control (intonation, rhythm...). The 
aim of the project is to provide the user with a support tool for easily 
specifying the prosody of the synthesized speech.
 
The work will focus on characterising essential prosodic elements needed 
for expressive speech synthesis, possibly restricted to a specific 
application domain. The chosen typology will have to match the prosody 
of the TTS corpora as accurately as possible, through a relevant set of 
prosodic primitives. The robustness of the topology is critical for 
automatic annotation of the databases.
The work will also address ergonomics -how to propose to the user a 
convenient way to specify prosody- and will be closely related to the 
signal production techniques -signal processing and/or unit selection.
 
 
* Research Lab:
The PhD will be hosted in the Speech Synthesis team at Orange Labs. 
Orange Labs develop a state-of-the-art corpus-based speech synthesizer 
(demonstrator available on http://tts.elibel.tm.fr).
 
 
* Requirements:
The candidate has a (research) master in Computer Science or Electrical 
Engineering. The candidate has a strong interest in doing research, 
excellent writing skills in French or English and good programming 
skills. Knowledge in speech processing or automatic classification is a 
plus.
 
 
* Contacts:
For more information please contact:
- Cedric Boidin, cedric.boidin@orange-ftgroup.com, +33 2 96 05 33 53
- Thierry Moudenc, thierry.moudenc@orange-ftgroup.com, +33 2 96 05 16 59
 
Back to Top

7-20 . Social speech scientist at Wright Patterson AFB, Ohio, USA

Title:  Social Scientist, DR-0101-II
 
Salary:   Base salary range for the position will be between $56,948 to
$89,423.  Salary will be supplemented by an additional amount related to
the cost of living in Dayton, Ohio.  
 
The Collaborative Interfaces Branch of the Air Force Research
Laboratory, Human Effectiveness Directorate, located at Wright-Patterson
AFB, OH (just outside Dayton, OH) is seeking to hire a social scientist,
(DR-0101-II).  The selectee will contribute to all phases of basic
research, applied research, and prototype development projects involving
the application of linguistic and computer science principles to the
technical areas of computational linguistics and natural language
processing with application to speech-to-speech translation, machine
translation of foreign languages, information retrieval, named entity
detection, topic detection, text categorization, text processing, speech
recognition, speech synthesis, and speech processing.  The selectee will
determine how best to accomplish the research objectives, develop and
evaluate alternatives, design and conduct experiments, analyze and
interpret results, publish papers and technical reports, deliver
presentations of research results, monitors in-house and contractual
work efforts, and meets with customers to determine technology needs.
The selectee will develop patents and licensing strategies for
technologies developed in the research program, where appropriate.
 
All applicants will need to be United States citizens.  To be considered
qualified, applicants must meet the basic requirements for Social
Scientist positions.  These requirements are a degree: behavioral or
social science; or related disciplines appropriate to the position OR a
combination of education and experience--that provided the applicant
with knowledge of one or more of the behavioral or social sciences
equivalent to a major in the field.  OR four years of appropriate
experience that demonstrated that the applicant has acquired knowledge
of one or more of the behavioral or social sciences equivalent to a
major in the field.  More information on these basic requirements can be
found at: http://www.opm.gov/qualifications/SEC-IV/A/GS-PROF.asp . 
 
To apply for this position, please go to:
http://jobsearch.usajobs.gov/getjob.asp?JobID=3D70353344&AVSDM=3D2008%2D0
4%2
D03+00%3A03%3A01&Logo=3D0&sort=3Drv&jbf571=3D2&FedEmp=3DY&vw=3Dd&brd=3D38
76&ss=3D0&Fed
Pub=3DY&rad=3D10&zip=3D45433
 
For more information on this position, please contact David Crawford at
(937) 255-1788 or via e-mail at david.crawford3@wpafb.af.mil .  All
application packages must be received or postmarked by the close of this
announcement: 22 May 2008. 
 
 
Back to Top

7-21 . Professeur a PHELMA du Grenoble INP (in french)

Un poste de Professeur des universités 61e section à l'école PHELMA 
du Grenoble INP est ouvert au concours pour la rentrée 2008. Les profils 
enseignement et recherche sont décrits ds la fiche de poste ci-jointe.
   Le profil recherche a été défini par le département "Parole et 
Cognition" de GIPSA-Lab. L'équipe "Machines Parlantes, Agents 
Conversationnels & Interaction Face-à-face" du département est 
particulièrement ciblée par le projet d'intégration, bien que le projet 
puisse concerner d'autres équipes. Vous trouverez le descriptif des 
thèmes de recherche de GIPSA-lab, du département et de ses équipes ainsi 
que les contacts appropriés sur http://www.gipsa-lab.inpg.fr. Merci de 
prendre contact avec la direction du département pour tout renseignement 
complémentaire.
 
   Gerard BAILLY, directeur-adjoint du GIPSA-Lab
 
Back to Top

7-22 . POSTDOCTORAL FELLOWSHIP OPENING AT ICSI Berkeley

POSTDOCTORAL FELLOWSHIP OPENING AT ICSI

The International Computer Science Institute (ICSI) invites applications for a Postdoctoral Fellow position in spoken language
processing. The Fellow will be working with Dilek Hakkani-Tur, along with other PhD student and international colleagues, in the area of information distillation. Some experience with machine learning for text categorization is required, along with strong capabilities in speech and language processing in general.

ICSI is an independent not-for-profit Institute located a few blocks from the Berkeley campus of the University of California. It is
closely affiliated with the University, and particularly with the Electrical Engineering and Computer Science (EECS) Department. See
http://www.icsi.berkeley.edu to learn more about ICSI.

The ICSI Speech Group has been a source of novel approaches to speech and language processing since 1988. It is primarily known for its work in speech recognition, although it has housed major projects in speaker recognition,
metadata extraction, and language understanding in the last few years. The effort in information distillation will draw upon lessons learned in our previous work for language understanding.

Applications should include a cover letter, vita, and the names of at least 3 references (with both postal and email addresses). Applications should be sent by email to dilek@icsi.berkeley.edu

ICSI is an Affirmative Action/Equal Opportunity Employer. Applications  from women and minorities are especially encouraged. Hiring is contingent on eligibility to work in the United States.

Back to Top

7-23 . PhD positions at GIPSA (formerly ICP) Grenoble France

Laboratory: GIPSA-lab, Speech & Cognition Dept.
Address : ENSIEG, Domaine Universitaire - BP46, 38402 Saint Martin d'Hères
Thesis supervisor: Pierre Badin
e-mail address: Pierre.Badin@gipsa-lab.inpg.fr
Co- supervisor(s): Gérard Bailly
Title: Control of talking heads by multimodal inversion – Application to language learning
and rehabilitation
Context and problem :
Speech production necessitates fairly precise control of the various orofacial articulators (jaw, lips, tongue, velum, cheeks, etc.). Regulating these gestures implies that a fairly precise feedback about his / her vocal production is available to the speaker. Auditory feedback is essential and its degradation can generate degradation, if not total loss, of speech production capabilities. In fact, the perception of the acoustic consequences of articulatory gestures can be degraded in different ways: either peripherically through the degradation, if not the complete loss, of this feedback (deaf and hearing impaired people, implanted or not), either in a more central way through the loss of sensitivity to phonological contrasts due to phonological deafness (contrasts not exploited in the mother language: i.e. Japanese speakers have extreme difficulties producing the /l/ vs. /r/ contrast not exploited in their mother language).
The stake of this doctoral work is to explore the speakers’ abilities to exploit a virtual multisensory
feedback that complements, if not substitutes for, the failing auditory feedback. The virtual
feedback that will be designed and studied in this framework will be provided by a talking head (see on the right in 2D or 3D) that reproduces in an augmented reality mode – in real time or offline – the articulation of a sound for which only the acoustical and / or visual signal is available.
The thesis challenge is to design and assess a robust system that can estimate the articulation from its sensory consequences and in particular that deals with the normalisation problem (establishing the correspondence between the audiovisual spaces of the talking head and of the speaker), and then to quantify the benefit that an hearing impaired person or a second language learner can gain from a restored sensory motor feedback loop.
 
---------------------------------------------------------------------------------------------------------------------------------
Multimodality for face-to-face interaction between an embodied conversational agent and a human
partner: experimental software platform
Thesis financed by a research grant from Rhône-Alpes region - 1750€ gross/month
Selected in 2008 by the research Cluster ISLE (http://www.grenoble-universites.fr/isle)
The research work aims at developing multimodal systems enabling an
embodied conversational agent and a human partner to engage into a
situated face-to-face conversation notably involving objects of the
environment. These interactive multimodal systems involve numerous
software sensors and actuators such as recognizing/synthesizing speech,
facial expressions, gaze or gestures of the interlocutors. The environment
in which this interaction occurs should also be analyzed so that to
maintain or attract attention towards objects of interest in the dialog.
Perception-action loops of these multimodal systems should finally take into account the mutual
conditioning of the cognitive states of the interlocutors as well as the psychophysical, linguistic and social
dimensions of these multimodal turns.
In this context and due to the complexity of the signal and information processing to implement, the
objective of this work is first to conceive and implement a wizard-of-Oz software platform for exploring
the conception space by simulating parts of this interactive system by a human accomplice while other
parts are taken in charge by automatic behavior. The first objective of the work is to study the impact of
this faked versus automatic behavior on the interactions in terms of cognitive load, subject’s satisfaction
or task performance. The final objective is of course to progressively substitute to human intelligence and
comprehension of the scene an autonomous context-sensitive and context-aware interactive system.
The software platform should warrant real-time processing of perceived and generated multimodal events
and should provide the wizard-of-Oz with tools that are adequate and intuitive for controlling the part of
the simulated behavior of the system.
This thesis will be conducted in the framework of the OpenInterface european project (FP6-IST-35182 on
multimodal interaction) and the ANR project Amorces (human-robot collaboration for manipulating
objects).
Expected results
Experimental:
• Prototype of the Wizard-of-Oz platform
• Recordings of multimodal conversations between an embodied conversational agent and a human
partner using the prototype
Theoretical :
• Taxonomy of Wizard-of-Oz platforms
• Design of real-time Wizard-of-Oz platforms
• Highly modular software model of multimodal systems
• Multi-layered model of face-to-face conversation
Keywords
Interaction model, multimodality, multimodal dialog, interaction engineering, software architecture,
Wizard-of-Oz platform
Thesis proposed by
Gérard BAILLY, GIPSA-Lab, MPACIF team Gerard.Bailly@gipsa-lab.inpg.fr
Laurence NIGAY, LIG, IIHM team Laurence.Nigay@imag.fr
Doctoral program: EEATS GRENOBLE – FRANCE http://www.edeeats.inpg.fr/
Back to Top

7-24 . PhD in speech signal processing at Infineon Sophia Antipolis

InfineonLogo

Open position: PhD in speech signal processing

 

 

Title: Solutions for non-linear acoustic echo.

 

Background:

Acoustic echo is an annoying disturbance due to the sound feedback between the loudspeaker and the microphone of terminals. Acoustic echo canceller and residual echo cancellation are widely used to reduce the echo signal. The performance of existing echo reduction systems strongly relies on the assumption that the echo path between transducers is linear. However, today’s competitive audio consumer market may favour sacrificing linear performance for the integration of low cost analogue components. The assumption of linearity is not hold anymore, due to the nonlinear distortions introduced by the loudspeakers and the small housing where transducers are placed.

 

Task:

The PhD thesis will lead research in the field of non-linear system applied for acoustic echo reduction. The foreseen tasks deal first with proper modelling of mobile phone transducers presenting non-linearity, to get a better understanding in which environment echo reduction works. Using this model as a basis, study of performance of linear system will permit to get a good understanding on the problems brought by non-linearity. In a further step, the PhD student will develop and test non-linear algorithms coping with echo cancellation in non-linear environment.

About the Company:

Sophia-Antipolis site is one of the main Infineon Technologies research and development centers worldwide. Located in the high-tech valley of Sophia-Antipolis, near Nice in the south of France, a team of 140 experienced research and development engineers specialized in Mobile Solutions, Embedded SRAM, and Design-Flow Software. The PhD will take place within the Mobile Solution group which is responsible for specifying and designing baseband integrated circuits for cellular phones. The team is specialized in innovative algorithm development, especially in audio, system specification and validation, circuit design and embedded software. Its work makes a significant contribution to the Infineon Technologies wireless chipset portfolio.

Required skills:

-        Master degree

-        Strong background in signal processing.

-        Background in speech signal or non-linear system processing is a plus.

-        Programming: Matlab, C.

-        Knowledge in C-fixed point / DSP implementation is a plus.

-        Language: English

Length of the PhD: 3 years

Place: Infineon Technologies France, Sophia-Antipolis

Contact:

Christophe Beaugeant

Phone: +33 (0)4 92 38 36 30

E-mail : christophe.beaugeant@infineon.com

Back to Top

7-25 . PhD position at Institut Eurecom Sophia Antipolis France

Institut Eurécom, Sophia Antipolis, France
 
Doctoral Position
 
Title: Speaker Diarisation for Internet-based Content Processing
 
Department: Multimedia Communications
URL: http://www.eurecom.fr/research/
Start Date: Immediate vacancy
Duration: 3 years
 
Description: Also known as the “who spoke when?” task, speaker diarization aims to detect the 
number of speakers within an audio document and to identify when each speaker is active. Speaker
diarization is an important problem with applications in speaker indexing, document retrieval,
rich transcription, speech and speaker recognition/biometrics and video conferencing, among
others. Research to date has focused on narrow application domains, namely telephone
speech, broadcast news and meeting recordings. In line with recent shifts in the field, this
research project will explore exciting new applications of speaker diarization in the area of
Internet-based content processing, especially user-generated content. The diversity of such
content presents a number of new challenges. Some areas in which the candidate will be
expected to work involve speech enhancement / noise compensation, beam-forming, speech
activity detection, channel compensation and statistical speaker modelling. The successful
candidate will have the opportunity for international travel and to become involved in
national and European projects and internationally competitive speaker diarization trials.
This position offers a unique opportunity to develop broad knowledge in cutting edge speech
and audio processing research.
 
Requirements: The successful candidate will have a Master’s degree in engineering, mathematics,
computing, physics or a related relevant discipline. You will have strong mathematical,
programming and communication skills and be highly motivated to undertake challenging
research. Good English language speaking and writing skills are essential.
 
Applications: Please send to the address below (i) a one page statement of research interests and
motivation, (ii) your CV and (iii) three letters of reference (2 academic, 1 personal).
Contact: Nicholas Evans
Postal Address: 2229 Route des Crêtes BP 193, F-06904 Sophia Antipolis cedex, France
Email address: nicholas.evans@eurecom.fr
Web address: http://www.eurecom.fr/main/institute/job.en.htm
Phone: +33/0 4 93 00 81 14
Fax: +33/0 4 93 00 82 00
 
Institut Eurécom is located in Sophia Antipolis, a vibrant science park on the French Riviera. It 
is in close proximity with a large number of research units of leading multi-national corporations 
in the telecommunications, semiconductor and biotechnology sectors, as well as other outstanding 
research and teaching institutions. A freethinking, multinational population and the unique 
geographic location provide a quality of life without equal.
 
Institut Eurécom, 2229 Route des Crêtes BP 193, F-06904 Sophia Antipolis cedex, France
www.eurecom.fr
 
Back to Top

8 . Journals

8-1 . Papers accepted for FUTURE PUBLICATION in Speech Communication

 
 
Full text available on http://www.sciencedirect.com/ for Speech Communication subscribers and subscribing institutions. Free access for all to the titles and abstracts of all volumes and even by clicking on Articles in press and then Selected papers.
 
 
Back to Top

8-2 . Special Issue on Non-Linear and Non-Conventional Speech Processing-Speech Communication

Speech Communication

Call for Papers: Special Issue on Non-Linear and Non-Conventional Speech Processing

Editors: Mohamed CHETOUANI, UPMC

Marcos FAUNDEZ-ZANUY, EUPMt (UPC)

Bruno GAS, UPMC

Jean Luc ZARADER, UPMC

Amir HUSSAIN, Stirling

Kuldip PALIWAL, Griffith University

The field of speech processing has shown a very fast development in the past twenty years, thanks to both technological progress and to the convergence of research into a few mainstream approaches. However, some specificities of the speech signal are still not well addressed by the current models. New models and processing techniques need to be investigated in order to foster and/or accompany future progress, even if they do not match immediately the level of performance and understanding of the current state-of-the-art approaches.

An ISCA-ITRW Workshop on "Non-Linear Speech Processing" will be held in May 2007, the purpose of which will be to present and discuss novel ideas, works and results related to alternative techniques for speech processing departing from the mainstream approaches:  http://www.congres.upmc.fr/nolisp2007

We are now soliciting journal papers not only from workshop participants but also from other researchers for a special issue of Speech Communication on "Non-Linear and Non-Conventional Speech Processing"

Submissions are invited on the following broad topic areas:

I. Non-Linear Approximation and Estimation  

II. Non-Linear Oscillators and Predictors

III. Higher-Order Statistics

IV. Independent Component Analysis 

 V. Nearest Neighbours

 VI. Neural Networks 

 VII. Decision Trees

 VIII. Non-Parametric Models  

IX. Dynamics of Non-Linear Systems   

 X. Fractal Methods 

 XI. Chaos Modelling  

 XII. Non-Linear Differential Equations

All fields of speech processing are targeted by the special issue, namely :

1. Speech Production 

2. Speech Analysis and Modelling

3. Speech Coding 

4. Speech Synthesis 

5. Speech Recognition 

6. Speaker Identification / Verification 

7. Speech Enhancement / Separation 

8. Speech Perception

 

Back to Top

8-3 . Journal of Multimedia User Interfaces

The development of Multimodal User Interfaces relies on systemic research involving signal processing, pattern analysis, machine intelligence and human computer interaction. This journal is a response to the need of common forums grouping these research communities. Topics of interest include, but are not restricted to:

  • Fusion & Fission,
  • Plasticity of Multimodal interfaces,
  • Medical applications,
  • Edutainment applications,
  • New modalities and modalities conversion,
  • Usability,
  • Multimodality for biometry and security,
  • Multimodal conversational systems.

The journal is open to three types of contributions:

  • Articles: containing original contributions accessible to the whole research community of Multimodal Interfaces. Contributions containing verifiable results and/or open-source demonstrators are strongly encouraged.
  • Tutorials: disseminating established results across disciplines related to multimodal user interfaces.
  • Letters: presenting practical achievements / prototypes and new technology components.

JMUI is a Springer-Verlag publication from 2008.

 

The submission procedure and the publication schedule are described at:

www.jmui.org

The page of the journal at springer is:

http://www.springer.com/east/home?SGWID=5-102-70-173760003-0&changeHeader=true

More information:

Imre Váradi (varadi@tele.ucl.ac.be)

 

Back to Top

8-4 . CURRENT RESEARCH IN PHONOLOGY AND PHONETICS: INTERFACES WITH NATURAL LANGUAGE PROCESSING

  • A SPECIAL ISSUE OF THE JOURNAL TAL (Traitement Automatique des Langues)

    Guest Editors: Bernard Laks and Noël Nguyen


    There are long-established connections between research on the sound shape of language and natural language processing (NLP), for which one of the main driving forces has been the design of automatic speech synthesis and recognition systems. Over the last few years, these connections have been made yet stronger, under the influence of several factors. A first line of convergence relates to the shared collection and exploitation of the considerable resources that are now available to us in the domain of spoken language. These resources have come to play a major role both for phonologists and phoneticians, who endeavor to subject their theoretical hypotheses to empirical tests using large speech corpora, and for NLP specialists, whose interest in spoken language is increasing. While these resources were first based on audio recordings of read speech, they have been progressively extended to bi- or multimodal data and to spontaneous speech in conversational interaction. Such changes are raising theoretical and methodological issues that both phonologists/phoneticians and NLP specialists have begun to address.

    Research on spoken language has thus led to the generalized utilization of a large set of tools and methods for automatic data processing and analysis: grapheme-to-phoneme converters, text-to-speech aligners, automatic segmentation of the speech signal into units of various sizes (from acoustic events to conversational turns), morpho-syntactic tagging, etc. Large-scale corpus studies in phonology and phonetics make an ever increasing use of tools that were originally developed by NLP researchers, and which range from electronic dictionaries to full-fledged automatic speech recognition systems. NLP researchers and phonologists/phoneticians also have jointly contributed to developing multi-level speech annotation systems from articulatory/acoustic events to the pragmatic level via prosody and syntax.

    In this scientific context, which very much fosters the establishment of cross-disciplinary bridges around spoken language, the knowledge and resources accumulated by phonologists and phoneticians are now being put to use by NLP researchers, whether this is to build up lexical databases from speech corpora, to develop automatic speech recognition systems able to deal with regional variations in the sound pattern of a language, or to design talking-face synthesis systems in man-machine communication.

    LIST OF TOPICS

    The goal of this special issue will be to offer an overview of the interfaces that are being developed between phonology, phonetics, and NLP. Contributions are therefore invited on the following topics:

    . Joint contributions of speech databases to NLP and phonology/phonetics

    . Automatic procedures for the large-scale processing of multi-modal databases

    . Multi-level annotation systems

    . Research in phonology/phonetics and speech and language technologies: synthesis, automatic recognition

    . Text-to-speech systems

    . NLP and modelisation in phonology/phonetics

    Papers may be submitted in English (for non native speakers of French only) or French and will relate to studies conducted on French, English, or other languages. They must conform to the TAL guidelines for authors available at http://www.atala.org/rubrique.php3?id_rubrique=1.

    DEADLINES

    . 11 February 2008: Reception of contributions
    . 11 April 2008: Notification of pre-selection / rejection
    . 11 May 2008: Reception of pre-selected articles
    . 16 June 2008: Notification of final acceptance
    . 30 June 2008: Reception of accepted articles' final versions

    This special issue of Traitement Automatique des Langues will appear in autumn 2008.

    THE JOURNAL

    TAL (Traitement Automatique des Langues / Natural Language Processing, http://www.atala.org/rubrique.php3?id_rubrique=1) is a forty-year old international journal published by ATALA (French Association for Natural Language Processing) with the support of CNRS (French National Center for Scientific Research). It has moved to an electronic mode of publication, with printing on demand. This affects in no way its reviewing and selection process.

    SCIENTIFIC COMMITTEE

    . Martine Adda-Decker, LIMSI, Orsay
    . Roxane Bertrand, LPL, CNRS & Université de Provence
    . Philippe Blache, LPL, CNRS & Université de Provence
    . Cédric Gendrot, LPP, CNRS & Université de Paris III
    . John Goldsmith, University of Chicago
    . Guillaume Gravier, Irisa, CNRS/INRIA & Université de Rennes I
    . Jonathan Harrington, IPS, University of Munich
    . Bernard Laks, MoDyCo, CNRS & Université de Paris X
    . Lori Lamel, LIMSI, Orsay
    . Noël Nguyen, LPL, CNRS & Université de Provence
    . François Pellegrino, DDL, CNRS & Université de Lyon II
    . François Poiré, University of  Western Ontario
    . Yvan Rose, Memorial University of Newfoundland
    . Tobias Scheer, BCL, CNRS & Université de Nice
    . Atanas Tchobanov, MoDyCo, CNRS & Université de Paris X
    . Jacqueline Vaissière, LPP, CNRS & Université de Paris III
    . Nathalie Vallée, DPC-GIPSA, CNRS & Université de Grenoble III

Back to Top

8-5 . IEEE Signal Processing Magazine: Special Issue on Digital Forensics

Guest Editors:
Edward Delp, Purdue University (ace@ecn.purdue.edu)
Nasir Memon, Polytechnic University (memon@poly.edu)
Min Wu, University of Maryland, (minwu@eng.umd.edu)

We find ourselves today in a "digital world" where most information 
is created, captured, transmitted, stored, and processed in digital 
form. Although, representing information in digital form has many 
compelling technical and economic advantages, it has led to new 
issues and significant challenges when performing forensics analysis 
of digital evidence.  There has been a slowly growing body of 
scientific techniques for recovering evidence from digital data.  
These techniques have come to be loosely coupled under the umbrella 
of "Digital Forensics." Digital Forensics can be defined as "The 
collection of scientific techniques for the preservation, collection, 
validation, identification, analysis, interpretation, documentation 
and presentation of digital evidence derived from digital sources for 
the purpose of facilitating or furthering the reconstruction of 
events, usually of a criminal nature."

This call for papers invites tutorial articles covering all aspects 
of digital forensics with an emphasis on forensic methodologies and 
techniques that employ signal processing and information theoretic 
analysis. Thus, focused tutorial and survey contributions are 
solicited from topics, including but not limited to, the following:

 . Computer Forensics - File system and memory analysis. File carving.
 . Media source identification - camera, printer, scanner, microphone
identification.
 . Differentiating synthetic and sensor media, for example camera vs.
computer graphics images.
 . Detecting and localizing media tampering and processing.
 . Voiceprint analysis and speaker identification for forensics.
 . Speech transcription for forensics. Analysis of deceptive speech.
 . Acoustic processing for forensic analysis - e.g. acoustical gunshot
analysis, accident reconstruction.
 . Forensic musicology and copyright infringement detection.
 . Enhancement and recognition techniques from surveillance video/images.
Image matching techniques for auto-matic visual evidence
extraction/recognition.
 . Steganalysis - Detection of hidden data in images, audio, video. 
Steganalysis techniques for natural language steganography. Detection of covert
channels.
 . Data Mining techniques for large scale forensics.
 . Privacy and social issues related to forensics.
 . Anti-forensics. Robustness of media forensics methods against counter
measures.
 . Case studies and trend reports.

White paper submission: Prospective authors should submit white 
papers to the web based submission system at http://
www.ee.columbia.edu/spm/ according to the timetable. given below.  
White papers, limited to 3 single-column double-spaced pages, should 
summarize the motivation, the significance of the topic, a brief 
history, and an outline of the content.  In all cases, prospective 
contributors should make sure to emphasize the signal processing in 
their submission.

Schedule
 . White Paper Due: April 7, 2008
 . Notification of White paper Review Results: April 30, 2008
 . Full Paper Submission: July 15, 2008
 . Acceptance Notification: October 15, 2008
 . Final Manuscript Due: November 15, 2008
 . Publication Date: March 2009.


Back to Top

8-6 . Special Issue on Integration of Context and Content for Multimedia Management


                IEEE Transactions on Multimedia            
 Special Issue on Integration of Context and Content for Multimedia Management
=====================================================================

Guest Editors:

Alan Hanjalic, Delft University of Technology, The Netherlands
Alejandro Jaimes, IDIAP Research Institute, Switzerland
Jiebo Luo, Kodak Research Laboratories, USA
        Qi Tian, University of Texas at San Antonio, USA

---------------------------------------------------
URL: http://www.cs.utsa.edu/~qitian/cfp-TMM-SI.htm
---------------------------------------------------
Important dates:

Manuscript Submission Deadline:       April 1, 2008
        Notification of Acceptance/Rejection: July 1, 2008
        Final Manuscript Due to IEEE:         September 1, 2008
        Expected Publication Date:            January 2009

---------------------
Submission Procedure
---------------------
Submissions should follow the guidelines set out by IEEE Transaction on Multimedia.
Prospective authors should submit high quality, original manuscripts that have not
appeared, nor are under consideration, in any other journals.

-------
Summary
-------
Lower cost hardware and growing communications infrastructure (e.g., Web, cell phones,
etc.) have led to an explosion in the availability of ubiquitous devices to produce,
store, view and exchange multimedia (images, videos, music, text). Almost everyone is
a producer and a consumer of multimedia in a world in which, for the first time,
tremendous amount of contextual information is being automatically recorded by the
various devices we use (e.g., cell ID for the mobile phone location, GPS integrated in
a digital camera, camera parameters, time information, and identity of the producer).

In recent years, researchers have started making progress in effectively integrating
context and content for multimedia mining and management. Integration of content and
context is crucial to human-human communication and human understanding of multimedia:
without context it is difficult for a human to recognize various objects, and we
become easily confused if the audio-visual signals we perceive are mismatched. For the
same reasons, integration of content and context is likely to enable  (semi)automatic
content analysis and indexing methods to become more powerful in managing multimedia
data. It can help narrow part of the semantic and sensory gap that is difficult or
even impossible to bridge using approaches that do not explicitly consider context for
(semi)automatic content-based analysis and indexing.

The goal of this special issue is to collect cutting-edge research work in integrating
content and context to make multimedia content management more effective. The special
issue will unravel the problems generally underlying these integration efforts,
elaborate on the true potential of contextual information to enrich the content
management tools and algorithms, discuss the dilemma of generic versus narrow-scope
solutions that may result from "too much" contextual information, and provide us
vision and insight from leading experts and practitioners on how to best approach the
integration of context and content. The special issue will also present the state of
the art in context and content-based models, algorithms, and applications for
multimedia management.

-----
Scope
-----

The scope of this special issue is to cover all aspects of context and content for
multimedia management.

Topics of interest include (but are not limited to):
        - Contextual metadata extraction
        - Models for temporal context, spatial context, imaging context (e.g., camera
          metadata), social and cultural context and so on
- Web context for online multimedia annotation, browsing, sharing and reuse
- Context tagging systems, e.g., geotagging, voice annotation
- Context-aware inference algorithms
        - Context-aware multi-modal fusion systems (text, document, image, video,
          metadata, etc.)
- Models for combining contextual and content information
        - Context-aware interfaces
- Context-aware collaboration
- Social networks in multimedia indexing
- Novel methods to support and enhance social interaction, including
          innovative ideas integrating context in social, affective computing, and
          experience capture.
- Applications in security, biometrics, medicine, education, personal
          media management, and the arts, among others
- Context-aware mobile media technology and applications
- Context for browsing and navigating large media collections
- Tools for culture-specific content creation, management, and analysis

------------
Organization
------------
Next to the standard open call for papers, we will also invite a limited number of
papers, which will be written by prominent authors and authorities in the field
covered by this Special Issue. While the papers collected through the open call are
expected to sample the research efforts currently invested within the community on
effectively combining contextual and content information for optimal analysis,
indexing and retrieval of multimedia data, the invited papers will be selected to
highlight the main problems and approaches generally underlying these efforts.

All papers will be reviewed by at least 3 independent reviewers. Invited papers will
be solicited first through white papers to ensure the quality and relevance to the
special issue. The accepted invited papers will be reviewed by the guest editors and
expect to account for about one fourth of the papers in the special issue.

---------
Contacts
---------
Please address all correspondences regarding this special issue to the Guest Editors
Dr. Alan Hanjalic (A.Hanjalic@ewi.tudelft.nl), Dr. Alejandro Jaimes
(alex.jaimes@idiap.ch), Dr. Jiebo Luo (jiebo.luo@kodak.com), and Dr. Qi Tian
(qitian@cs.utsa.edu).
-------------------------------------------------------------------------------------

Guest Editors:
Alan Hanjalic, Alejandro Jaimes, Jiebo Luo, and Qi Tian


Back to Top

8-7 . CfP Speech Communication: Special Issue On Spoken Language Technology for Education

  • *CALL FOR PAPERS*

    Special Issue of Speech Communication

    on

    *Spoken Language Technology for Education*

     

    *Guest-editors:*

    Maxine Eskenazi, Associate Teaching Professor, Carnegie Mellon University

    Abeer Alwan, Professor, University of California at Los Angeles

    Helmer Strik, Assistant Professor, University of Nijmegen

     

    Language technologies have evolved to the stage where they are reliable
    enough, if their strong and weak points are properly dealt with, to be
    used for education. The creation of an application for education
    presents several challenges: making the language technology sufficiently
    reliable (and thus advancing our knowledge in the language
    technologies), creating an application that actually enables students to
    learn, and engaging the student. Papers in this special issue should
    deal with several of these issues. Although language learning is the
    primary target of research at present, papers on the use of language
    technologies for other education applications are encouraged. The scope
    of acceptable topic interests includes but is not limited to:

     

    - Use of speech technology for education

    - Use of spoken language dialogue for education

    - Applications using speech and natural language processing for education

    - Intelligent tutoring systems using speech and natural language

    - Pedagogical issues in using speech and natural language technologies
    for education

    - Assessment of tutoring software

    - Assessment of student performance

     

    *Tentative schedule for paper submissions, review, and revision**: ** *

    Deadline for submissions: June 1, 2008.

    Deadline for decisions and feedback from reviewers and editors: August
    31, 2008.

    Deadline for revisions of papers: November 31, 2008.

     

    *Submission instructions:*

    Authors should consult the "Guide for Authors", available online, at
    http://www.elsevier.com/locate/specom for information about the
    preparation of their manuscripts. Authors, please submit your paper via
    _http://ees.elsevier.com/specom_, choosing *Spoken Language Tech. *as
    the Article Type, and  Dr. Gauvain as the handling E-i-C.

Back to Top

8-8 . CfP Special Issue on Processing Morphologically Rich Languages IEEE Trans ASL

Call for Papers for a Special Issue on 
                 Processing Morphologically Rich Languages 
          IEEE Transactions on Audio, Speech and Language Processing
 
This is a call for papers for a special issue on Processing Morphologically=20
Rich Languages, to be published in early 2009 in the IEEE Transactions on 
Audio, Speech and Language Processing. 
 
Morphologically-rich languages like Arabic, Turkish, Finnish, Korean, etc.,=20
present significant challenges for speech processing, natural language 
processing (NLP), as well as speech and text translation. These languages a=e 
characterized by highly productive morphological processes (inflection, 
agglutination, compounding) that may produce a very large number of word 
forms for a given root form.  Modeling each form as a separate word leads 
to a number of problems for speech and NLP applications, including: 1) larg= 
vocabulary growth, 2) poor language model (LM) probability estimation, 
3) higher out-of-vocabulary (OOV) rate, 4) inflection gap for machine 
translation:  multiple different forms of  the same underlying baseform 
are often treated as unrelated items, with negative effects on word alignme=t 
and translation accuracy.  
 
Large-scale speech and language processing systems require advanced modelin= 
techniques to address these problems. Morphology also plays an important 
role in addressing specific issues of “under-studied” languages such as=data 
sparsity, coverage and robust modeling. We invite papers describing 
previously unpublished work in the following broad areas: Using morphology =for speech recognition and understanding, speech and text translation, 
speech synthesis, information extraction and retrieval, as well as 
summarization . Specific topics of interest include:
- methods addressing data sparseness issue for morphologically rich 
  languages with application to speech recognition, text and speech 
  translation, information extraction and retrieval, speech   
  synthesis, and summarization
- automatic decomposition of complex word forms into smaller units 
- methods for optimizing the selection of units at different levels of 
  processing
- pronunciation modeling for morphologically-rich languages
- language modeling for morphologically-rich languages
- morphologically-rich languages in speech synthesis
- novel probability estimation techniques that avoid data sparseness 
  problems
- creating data resources and annotation tools for morphologically-rich 
  languages
 
Submission procedure:  Prospective authors should prepare manuscripts 
according to the information available at 
http://www.signalprocessingsociety.org/periodicals/journals/taslp-author-in=ormation/. 
Note that all rules will apply with regard to submission lengths, 
mandatory overlength page charges, and color charges. Manuscripts should 
be submitted electronically through the online IEEE manuscript submission 
system at http://sps-ieee.manuscriptcentral.com/. When selecting a 
manuscript type, authors must click on "Special Issue of TASLP on 
Processing Morphologically Rich Languages". 
 
Important Dates:
Submission deadline:  August 1, 2008               
Notification of acceptance: December 31, 2008
Final manuscript due:  January 15, 2008    
Tentative publication date: March 2009
 
Editors
Ruhi Sarikaya (IBM T.J. Watson Research Center) sarikaya@us.ibm.com
Katrin Kirchhoff (University of Washington) katrin@ee.washington.edu
Tanja Schultz (University of Karlsruhe) tanja@ira.uka.de
Dilek Hakkani-Tur (ICSI) dilek@icsi.berkeley.ed
Back to Top

9 . Forthcoming events supported (but not organized) by ISCA

9-1 . SIGDIAL 2008 9th SIGdial Workshop on Discourse and Dialogue

SIGDIAL 2008 9th SIGdial Workshop on Discourse and Dialogue
COLUMBUS, OHIO; June 19-20 2008 (with ACL/HLT 2008)
     
http://www.sigdial.org/workshops/workshop9



Continuing with a series of successful workshops in Antwerp, Sydney,
Lisbon, Boston, Sapporo, Philadelphia, Aalborg, and Hong Kong, this
workshop spans the ACL and ISCA SIGdial interest area of discourse and
dialogue. This series provides a regular forum for the presentation of
research in this area to both the larger SIGdial community as well as
researchers outside this community. The workshop is organized by
SIGdial, which is sponsored jointly by ACL and ISCA. SIGdial 2008 will
be a workshop of ACL/HLT 2008.


TOPICS OF INTEREST

We welcome formal, corpus-based, implementation or analytical work on
discourse and dialogue including but not restricted to the following
three themes:

1. Discourse Processing and Dialogue Systems

Discourse semantic and pragmatic issues in NLP applications such as
text summarization, question answering, information retrieval
including topics like:

- Discourse structure, temporal structure, information structure
- Discourse markers, cues and particles and their use
- (Co-)Reference and anaphora resolution, metonymy and bridging
  resolution
- Subjectivity, opinions and semantic orientation

Spoken, multi-modal, and text/web based dialogue systems including
topics such as:

- Dialogue management models;
- Speech and gesture, text and graphics integration;
- Strategies for preventing, detecting or handling miscommunication
  (repair and correction types, clarification and under-specificity,
  grounding and feedback strategies);
- Utilizing prosodic information for understanding and for
  disambiguation;


2. Corpora, Tools and Methodology

Corpus-based work on discourse and spoken, text-based and multi-modal
dialogue including its support, in particular:

- Annotation tools and coding schemes;
- Data resources for discourse and dialogue studies;
- Corpus-based techniques and analysis (including machine learning);
- Evaluation of systems and components, including methodology, metrics
  and case studies;


3. Pragmatic and/or Semantic Modeling

The pragmatics and/or semantics of discourse and dialogue (i.e. beyond
a single sentence) including the following issues:

- The semantics/pragmatics of dialogue acts (including those which are
  less studied in the semantics/pragmatics framework);
- Models of discourse/dialogue structure and their relation to
  referential and relational structure;
- Prosody in discourse and dialogue;
- Models of presupposition and accommodation; operational models of
  conversational implicature.


SUBMISSIONS

The program committee welcomes the submission of long papers for full
plenary presentation as well as short papers and demonstrations. Short
papers and demo descriptions will be featured in short plenary
presentations, followed by posters and demonstrations.

- Long papers must be no longer than 8 pages, including title,
  examples, references, etc. In addition to this, two additional pages
  are allowed as an appendix which may include extended example
  discourses or dialogues, algorithms, graphical representations, etc.
- Short papers and demo descriptions should aim to be 4 pages or less
  (including title, examples, references, etc.).

Please use the official ACL style files:
   
http://www.ling.ohio-state.edu/~djh/acl08/stylefiles.html

Submission/Reviewing will be managed by the EasyChair system. Link to
follow.

Papers that have been or will be submitted to other meetings or
publications must provide this information (see submission
format). SIGdial 2008 cannot accept for publication or presentation
work that will be (or has been) published elsewhere. Any questions
regarding submissions can be sent to the co-Chairs.

Authors are encouraged to make illustrative materials available, on
the web or otherwise. For example, excerpts of recorded conversations,
recordings of human-computer dialogues, interfaces to working systems,
etc.


IMPORTANT DATES

Submission        Mar 14 2008
Notification      Apr 27 2008
Camera-Ready   May 16 2008
Workshop          June 19-20 2008


WEBSITES

Workshop website:
http://www.sigdial.org/workshops/workshop9
Submission link: To be announced
SIGdial organization website:
http://www.sigdial.org/
CO-LOCATION ACL/HLT 2008 website:
http://www.acl2008.org/


CONTACT

For any questions, please contact the co-Chairs at:
Beth Ann Hockey 
bahockey@ucsc.edu
David Schlangen 
das@ling.uni-potsdam.de



Back to Top

9-2 . LIPS 2008 Visual Speech Synthesis Challenge

LIPS 2008 is the first visual speech synthesis challenge. It will be

held as a special session at INTERSPEECH 2008 in Brisbane, Australia

(http://www.interspeech2008.org). The aim of this challenge is to

stimulate discussion about subjective quality assessment of synthesised

visual speech with a view to developing standardised evaluation procedures.

In association with this challenge a training corpus of audiovisual

speech and accompanying phoneme labels and timings will be provided to

all entrants, who should then train their systems using this data. (As

this is the first year the challenge will run and to promote wider

participation, proposed entrants are free to use a pre-trained model).

Prior to the session a set of test sentences (provided as audio, video

and phonetic labels) must be synthesised on-site in a supervised room. A

series of double-blind subjective tests will then be conducted to

compare each competing system against all others. The overall winner

will be announced and presented with their prize at the closing ceremony

of the conference.

All entrants will submit a 4/6 (TBC) page paper describing their system

to INTERSPEECH indicating that the paper is addressed to the LIPS special

session. A special edition of the Eurasip Journal on Speech, Audio and Music

Processing in conjunction with the challenge is also scheduled.

To receive updated information as it becomes available, you can join the

mailing list by visiting

https://mail.icp.inpg.fr/mailman/listinfo/lips_challenge. Further

details will be mailed to you in due course.

Please invite colleagues to join and dispatch this email largely to your

academic and industrial partners. Besides a large participation of

research groups in audiovisual speech synthesis and talking faces we

particularly welcome participation of the computer game industry.

Please confirm your willingness to participate in the challenge, submit

a paper describing your work and join us in Brisbane by sending an email

to sascha.fagel@tu-berlin.de, b.theobald@uea.ac.uk,

gerard.bailly@gipsa-lab.inpg.fr

 

Organising Committee

Sascha Fagel, University of Technology, Berlin - Germany

Barry-John Theobald, University of East Anglia, Norwich - UK

Gerard Bailly, GIPSA-Lab, Dpt. Speech & Cognition, Grenoble - France

 

Back to Top

9-3 . Human-Machine Comparisons of consonant recognition in quiet and noise

  • Consonant Challenge:
       Human-machine comparisons of consonant recognition in quiet and noise

                       Interspeech, 22-26 September 2008
                             Brisbane, Australia

    * Update:
    All information concerning the native listener experiments and baseline 
    recognisers
    including their results can now be found and downloaded from the Consonant 
    Challenge website:
    http://www.odettes.dds.nl/challenge_IS08/

    * Deadline for submissions:
    The deadline and paper submission guidelines for full paper submission (4 
    pages) is April
    7th, 2008. Paper submission is done exclusively via the Interspeech 2008 
    conference website.
    Participants of this Challenge are asked to indicate the correct Special 
    Session during
    submission. More information on the Interspeech conference can be found 
    here: http://
    www.interspeech2008.org/

    * Topic of the Consonant Challenge:
    Listeners outperform automatic speech recognition systems at every level 
    of speech
    recognition, including the very basic level of consonant recognition. What 
    is not clear is
    where the human advantage originates. Does the fault lie in the acoustic 
    representations of
    speech or in the recogniser architecture, or in a lack of compatibility 
    between the two?
    There have been relatively few studies comparing human and automatic 
    speech recognition on
    the same task, and, of these, overall identification performance is the 
    dominant metric.
    However, there are many insights which might be gained by carrying out a 
    far more detailed
    comparison.

    The purpose of this Special Session is to make focused human-computer 
    comparisons on a task
    involving consonant identification in noise, with all participants using 
    the same training
    and test data. Training and test data and native listener and baseline 
    recogniser results
    will be provided by the organisers, but participants are encouraged to 
    also contribute
    listener responses.

    * Call for papers:
    Contributions are sought in (but not limited to) the following areas:

    - Psychological models of human consonant recognition
    - Comparisons of front-end ASR representations
    - Comparisons of back-end recognisers
    - Exemplar vs statistical recognition strategies
    - Native/Non-native listener/model comparisons

    * Organisers:
    Odette Scharenborg (Radboud University Nijmegen, The Netherlands -- 
    O.Scharenborg@let.ru.nl)
    Martin Cooke (University of Sheffield, UK -- M.Cooke@dcs.shef.ac.uk)