Contents

1 . Editorial

Dear Members,

The Board has taken an important decision:  INTERSPEECH 2011 will take place in Florence,Italy. I am sure that it will be successful and will attract a lot of you into this wonderful city, cradle of the European Renaissance.

Meanwhile, life is going on.  You have to prepare your trip to Brisbane.  But do not forget  all the appealing workshops listed below.

I still receive interesting job offers: I draw the attention of our young members to the possibilities of thesis funding and postdoc positions.

We are still working to improve ISCApad with the efficient help of Laurence Liu, a student of Helen Meng's from Hong Kong.

Please pay attention to our section ISCA  News: the association needs your help.

Prof. em. Chris Wellekens

Institut Eurecom

France 

 

Back to Top

2 . ISCA News

2-1 . ISCA Scientific Achievement Medalist 2008

ISCA Scientific Achievement Medal for 2008 It is with great pleasure that I announce the ISCA Medalist for 2008 - Hiroya Fujisaki. Prof. Fujisaki has contributed to the speech research community in so many aspects, in speech analysis, synthesis and prosody, that it will be a very hard task for me to summarize his long list of achievements. He is also the founder of the ICSLP series of conferences which, being now fully integrated as one of ISCA's yearly conferences, will have its 10th anniversary this year.

Back to Top

2-2 . INTERSPEECH 2011 in Florence

ISCA announces with great pleasure that the venue for
Interspeech 2011 will be FLORENCE.

Back to Top

2-3 . Help ISCA serve you better

The ISCA board is always interested in improving its activities and the membership services it provides. To help us with this, could you please send us your ideas/comments/suggestions/impressions? We would be most grateful if you could take a moment to complete the form on the ISCA website : http://www.isca-speech.org/index.php and send us your feedback.

Your message will be sent to the ISCA secretariat : secretariat@isca-speech.org

Please enter ideas/comments/suggestions/impressions you may have on any new (or old) activities and membership services.

Please note: you can send us your comments anonymously, if you so wish.

Eva Hajicova - Membership Services 

 Emmanuelle Foxonet - ISCA Secretariat  for the ISCA board 

Back to Top

3 . SIG's activities

3-1 . SLaTE

The International Speech Communication Association Special Interest Group (ISCA SIG) on

Speech and Language Technology in Education

A special interest group was created in mid-September 2006 at the Interspeech 2006 conference in Pittsburgh. This is its official website. On this site you can find information about the SIG.

The next SLaTE ITRW will be in 2009 in England.

OUR STATEMENT OF PURPOSE

The purpose of the International Speech Communication Association (ISCA) Special Interest Group on Speech and Language Technology in Education (SLaTE) shall be to promote interest in the use of speech and natural language processing for education; to provide members of ISCA with a special interest in speech and language technology in education with a means of exchanging news of recent research developments and other matters of interest in Speech and Language Technology in Education; to sponsor meetings and workshops on that subject that appear to be timely and worthwhile, operating within the framework of ISCA's by-laws for SIGs; and to provide and make available resources relevant to speech and language technology in education, including text and speech corpora, analysis tools, analysis and generation software, research papers and generated data.

Back to Top

4 . Future ISCA Conferences and Workshops(ITRW)

4-1 . INTERSPEECH 2008

INTERSPEECH 2008 incorporating SST 08

September 22-26, 2008

Brisbane Convention & Exhibition Centre

Brisbane, Australia

http://www.interspeech2008.org/

 

Interspeech is the world's largest and most comprehensive conference on Speech

Science and Speech Technology. We invite original papers in any related area,

including (but not limited to):

             Human Speech Production, Perception and Communication; 

             Speech and Language Technology; 

             Spoken Language Systems; and 

 

            Applications, Resources, Standardisation and Evaluation

  • In addition, a number of Special Sessions on selected topics have been organised and we invite you to submit for these also (see website for a complete list).

    Interspeech 2008 has two types of submission formats: Full 4-page Papers and

     Short 1-page Papers. Prospective authors are invited to submit papers in either

     format via the conference website by 7 April 2008. 

     

    Important Dates 

    Paper Submission: Monday, 7 April 2008, 3pm GMT 

    Notification of Acceptance/Rejection: Monday, 16 June 2008, 3pm GMT 

    Early Registration Deadline: Monday, 7 July 2008, 3pm GMT 

    Tutorial Day: Monday, 22 September 2008 

    Main conference: 23-26 September 2008 

     For more information please visit the website http://www.interspeech2008.org

     

    Chairman: Denis Burnham, MARCS, University of West Sydney.  

Back to Top

4-2 . INTERSPEECH 2009

Brighton, UK,
Conference Website
Chairman: Prof. Roger Moore, University of Sheffield.

Back to Top

4-3 . INTERSPEECH 2010

Chiba, Japan
Conference Website
ISCA is pleased to announce that INTERSPEECH 2010 will take place in Makuhari-Messe, Chiba, Japan, September 26-30, 2010. The event will be chaired by Keikichi Hirose (Univ. Tokyo), and will have as a theme "Towards Spoken Language Processing for All - Regardless of Age, Health Conditions, Native Languages, Environment, etc."

Back to Top

4-4 . ITRW on experimental linguistics

August 2008, Athens, Greece
Website
Prof. Antonis Botinis


Back to Top

4-5 . International Conference on Auditory-Visual Speech Processing AVSP 2008

Dates: 26-29 September 2008

Location: Moreton Island, Queensland, Australia
Website: http://express.hid.ri.cmu.edu/AVSP2008/Main.html

AVSP 2008 will be held as an ISCA Tutorial and Research Workshop at
Tangalooma Wild Dolphin Resort on Moreton Island from the 26-29
September 2008. AVSP 2008 is a satellite conference to Interspeech 2008,
being held in Brisbane from the 22-26 September 2008. Tangalooma is
located at close distance from Brisbane, so that attendance at AVSP 2008
can easily be combined with participation in Interspeech 2008.

Auditory-visual speech production and perception by human and machine is
an interdisciplinary and cross-linguistic field which has attracted
speech scientists, cognitive psychologists, phoneticians, computational
engineers, and researchers in language learning studies. Since the
inaugural workshop in Bonas in 1995, Auditory-Visual Speech Processing
workshops have been organised on a regular basis (see an overview at the
avisa website). In line with previous meetings, this conference will
consist of a mixture of regular presentations (both posters and oral),
and lectures by invited speakers.

Topics include but are not limited to:
- Machine recognition
- Human and machine models of integration
- Multimodal processing of spoken events
- Cross-linguistic studies
- Developmental studies
- Gesture and expression animation
- Modelling of facial gestures
- Speech synthesis
- Prosody
- Neurophysiology and neuro-psychology of audition and vision
- Scene analysis

Paper submission:
Details of the paper submission procedure will be available on the
website in a few weeks time.

Chairs:
Simon Lucey
Roland Goecke
Patrick Lucey


Back to Top

4-6 . Christian Benoit workshop on Speech and Face to Face Communication

NEW Deadline for sending one page abstract = JUNE 9TH


Ten years after our colleague Christian Benoît departed, the mark that
he left is still very vivid in the international community. There will
soon be several occasions to honour his memory: during the next
Interspeech conference (Christian was secretary of the ESCA, future
ISCA, for a long time, the association is a French association of the
type described in the 1901 law and its official headquarters are still
in Grenoble), as well as during the next AVSP workshop (workshop of
which he was one of the creators). The Christian Benoît Association was
created in 1999 and regularly awards young researchers the "Christian
Benoît prize" to promote their research (the 4^th prize was awarded to
the phonetician Susanne Fuchs in 2007). The Christian Benoît association
http://www.icp.inpg.fr/ICP/_communication.fr.html#prixcb), along with
ICP, now Speech and Cognition Department of Gipsa-lab
(http://www.gipsa-lab.inpg.fr <http://www.gipsa-lab.inpg.fr/>), are
organizing a workshop/summer school to Christian Benoît’s memory, in the
line of his innovative and enthusiastic research style and aiming at
exploring the topic of "Speech and Face to Face Communication" in a
pluridisciplinary perspective: neuroscience, cognitive psychology,
phonetics, linguistics and computer modelling. The workshop "Speech and
Face to Face Communication" will be organized around 11 invited
conferences. All researchers from the field are invited to participate
through a call for papers and students will be encouraged to widely
attend the workshop and present their work.

Website: http://www.icp.inpg.fr/~dohen/face2face/

Deadline for sending one page abstracts: June 9th (see Call for Papers
<http://ww.icp.inpg.fr/%7Edohen/face2face/CallForPapers.html>)

You can subscribe to the Christian Benoît Association by sending 15
euros (active member; 45 euros or more, benefactors) to Pascal Perrier,
secretary of the association: Pascal.Perrier@gipsa-lab.inpg.fr
<mailto:Pascal.Perrier@gipsa-lab.inpg.fr>.

Back to Top

4-7 . CfP Second IEEE Spoken Language Technology Workshop Goa

Call for Papers:
Second IEEE Spoken Language Technology Workshop
Goa, India
December 15-18, 2008

The Second IEEE Spoken Language Technology (SLT) workshop will be held from December 15 to December 18, 2008 in Goa, India. The goal of this workshop is to bring both the speech processing and natural language processing communities together to share and present recent advances in various areas of spoken language technology, with the expectation that such a confluence of the researchers from both communities will foster new ideas, collaborations and new research directions in this area. The SLT 2008 workshop is endorsed by both ISCA and ACL organizations and eligible participants can apply for ISCA grants (http://www.isca-speech.org/grants.html).

Spoken language technology is a vibrant research area, with the potential for significant impact on government and industrial applications especially with the diversity and challenges offered by the multilingual business climates of today's world.

The workshop solicits papers on all aspects of spoken language technology:

 o Spoken language understanding
 o Spoken document summarization
 o Machine translation for speech
 o Spoken dialog systems
 o Spoken language generation
 o Spoken document retrieval
 o Human computer Interactions (HCI)
 o Speech data mining
 o Information extraction from speech
 o Question answering from speech
 o Multimodal processing
 o Spoken language based assistive technologies
 o Spoken language systems and applications
 o Spoken language databases and standards

In addition, this year's workshop will feature three special sessions:

 1) Challenges in Asian spoken language processing with special emphasis on Indian languages
 2) Mining human-human conversations: A resource for building efficient human-machine dialogs
 3) Spoken Language on the go: Challenges and Opportunities for spoken language processing on mobile devices

Submissions for the Technical Program
-------------------------------------
The workshop program will consist of tutorials, oral and poster presentations, and panel discussions. Attendance will be limited with priority for those who will present technical papers; registration is required of at least one author for each paper. Submissions are encouraged on any of the topics listed above. The style guide, templates, and submission form will follow the IEEE ICASSP style. Three members of the Scientific Committee will review each paper. The workshop proceedings will be published on a CD-ROM.

Important Dates
---------------
*Camera-ready paper submission deadline: August 8, 2008
Hotel Reservation and Workshop registration opens: August 8, 2008
Paper Acceptance / Rejection: September 15, 2008
Hotel Reservation and Early Registration closes: October 5, 2008
Workshop: December 15-18, 2008*

For more information visit the SLT 2008 website http://slt2008.org or contact the organizing committee at info@slt2008.org <mailto:info@slt2008.org> if you have any questions.

Back to Top

5 . Books, databases and softwares

5-1 . Books

La production de la parole
Author: Alain Marchal, Universite d'Aix en Provence, France
Publisher: Hermes Lavoisier
Year: 2007

Speech enhancement-Theory and Practice
Author: Philipos C. Loizou, University of Texas, Dallas, USA
Publisher: CRC Press
Year:2007

Speech and Language Engineering
Editor: Martin Rajman
Publisher: EPFL Press, distributed by CRC Press
Year: 2007

Human Communication Disorders/ Speech therapy
This interesting series can be listed on Wiley website

Incurses em torno do ritmo da fala
Author: Plinio A. Barbosa
Publisher: Pontes Editores (city: Campinas)
Year: 2006 (released 11/24/2006)
(In Portuguese, abstract attached.) Website

Speech Quality of VoIP: Assessment and Prediction
Author: Alexander Raake
Publisher: John Wiley & Sons, UK-Chichester, September 2006
Website

Self-Organization in the Evolution of Speech, Studies in the Evolution of Language
Author: Pierre-Yves Oudeyer
Publisher:Oxford University Press
Website

Speech Recognition Over Digital Channels
Authors: Antonio M. Peinado and Jose C. Segura
Publisher: Wiley, July 2006
Website

Multilingual Speech Processing
Editors: Tanja Schultz and Katrin Kirchhoff ,
Elsevier Academic Press, April 2006
Website

Reconnaissance automatique de la parole: Du signal a l'interpretation
Authors: Jean-Paul Haton
Christophe Cerisara
Dominique Fohr
Yves Laprie
Kamel Smaili
392 Pages Publisher: Dunod

 

*Automatic Speech Recognition on Mobile Devices and over Communication 
Networks
*Editors: Zheng-Hua Tan and Børge Lindberg
Publisher: Springer, London, March 2008
website <http://asr.es.aau.dk/>
 
About this book
The remarkable advances in computing and networking have sparked an 
enormous interest in deploying automatic speech recognition on mobile 
devices and over communication networks. This trend is accelerating.
This book brings together leading academic researchers and industrial 
practitioners to address the issues in this emerging realm and presents 
the reader with a comprehensive introduction to the subject of speech 
recognition in devices and networks. It covers network, distributed and 
embedded speech recognition systems, which are expected to co-exist in 
the future. It offers a wide-ranging, unified approach to the topic and 
its latest development, also covering the most up-to-date standards and 
several off-the-shelf systems.
 
Latent Semantic Mapping: Principles & Applications
Author: Jerome R. Bellegarda, Apple Inc., USA
Publisher: Morgan & Claypool
Series: Synthesis Lectures on Speech and Audio Processing
Year: 2007
Website: http://www.morganclaypool.com/toc/sap/1/1
 

The Application of Hidden Markov Models in Speech Recognition By Mark Gales and Steve Young (University of Cambridge)
http://dx.doi.org/10.1561/2000000004
 
in Foundations and Tr=nds in Signal Processing (FnTSIG)
www.nowpublishers.com/SIG 
 
 
Proceedings of the IEEE
 
Special Issue on ADVANCES IN MULTIMEDIA INFORMATION RETRIEVAL
 
Volume 96, Number 4, April 2008
 
Guest Editors:
 
Alan Hanjalic, Delft University of Technology, Netherlands
Rainer Lienhart, University of Augsburg, Germany
Wei-Ying Ma, Microsoft Research Asia, China
John R. Smith, IBM Research, USA
 
Through carefully selected, invited papers written by leading authors and research teams, the April 2008 issue of Proceedings of the IEEE (v.96, no.4) highlights successes of multimedia information retrieval research, critically analyzes the achievements made so far and assesses the applicability of multimedia information retrieval results in real-life scenarios. The issue provides insights into the current possibilities for building automated and semi-automated methods as well as algorithms for segmenting, abstracting, indexing, representing, browsing, searching and retrieving multimedia content in various contexts. Additionally, future challenges that are likely to drive the research in the multimedia information retrieval field for years to come are also discussed.
 
 
 Computeranimierte Sprechbewegungen in realen Anwendungen
Authors: Sascha Fagel and Katja Madany
102 pages
Publisher: Berlin Institute of Technology
Year: 2008
Website http://www.ub.tu-berlin.de/index.php?id=1843
To learn more, please visit the corresponding IEEE Xplore site at

Usability of Speech Dialog Systems

Listening to the Target Audience
Series:
Signals and Communication Technology

Hempel, Thomas (Ed.)

2008, X, 175 p. 14 illus., Hardcover

ISBN: 978-3-540-78342-8

Speech and Language Processing, 2nd Edition

By Daniel Jurafsky, James H. Martin

  • Published May 16, 2008 by Prentice Hall.
  • More Info
    • Copyright 2009
    • Dimensions 7" x 9-1/4"
    • Pages: 1024
    • Edition: 2nd.
    • ISBN-10: 0-13-187321-0
    • ISBN-13: 978-0-13-187321-6
    • Request an Instructor or Media review copy

An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. KEY TOPICS: Builds each chapter around one or more worked examples demonstrating the main idea of the chapter, usingthe examples to illustrate the relative strengths and weaknesses of various approaches. Adds coverage of statistical sequence labeling, information extraction, question answering and summarization, advanced topics in speech recognition, speech synthesis. Revises coverage of language modeling, formal grammars, statistical parsing, machine translation, and dialog processing. MARKET: A useful reference for professionals in any of the areas of speech and language processing.

  

 
Back to Top

5-2 . LDC News

Membership Mailbag - 'Penn' Treebanks and Recent Directions in English Treebanking

LDC2008T07

 

 

LDC2008L02
 
LDC2008S04
 

 

 

In this month's newsletter, the Linguistic Data Consortium (LDC) would like to introduce our new Membership Mailbag series of newsletter articles and announce the availability of three new publications.
 

 


 

Membership Mailbag - 'Penn' Treebanks and Recent Directions in English Treebanking
 

The LDC Membership Office responds to over 4000 emailed queries a year, and, over time, we've noticed that some questions tend to crop up with regularity.  To address the questions that you, our data users, have asked, we'd like to introduce our new Membership Mailbag series of newsletter articles.  This month we will look into the differences between the 'Penn' Treebanks and review recent directions in English treebanking.

Treebank-2 and Treebank-3 both contain 1 million words of Wall Street Journal (WSJ) text  and a small sample of ATIS-3 data that have been annotated using a Treebank II annotation-style, plus a part-of-speech tagged version of the Brown corpus.  Treebank-3 is considered a super-set of Treebank-2.  That is, if you are undecided between Treebank-2 and -3, in most instances, the best choice would be Treebank-3. Treebank-3 corrects known technical errors in Treebank-2 plus it contains Switchboard data which has been tagged, dysfluency-annotated, and a small portion of the Brown corpus which has been parsed in the Treebank II annotation-style.   

Note, however, that there are a few items missing from Treebank-3 that are found in Treebank-2.  Treebank 3 does not contain the complete parsed Brown corpus.  Treebank-2 contains the complete parsed Brown corpus done in the older Treebank I annotation-style; this is not contained in Treebank-3. Also, Treebank-3 does not include the tgrep software for extracting data, but tgrep and a newer version, tgrep2, are freely available online.  Finally, Treebank-3 does not contain the raw Wall Street Journal (WSJ) text, but organizations can obtain this by request.

Much recent treebanking has focused on languages other than English, but English treebanking efforts did not come to an end with the release of Treebank-3.  Ongoing work uses an updated Treebank II annotation-style and consists of two types of annotation; straight treebanking and treebanking in combination with another kind of annotation.  Straight treebank annotation can be found in corpora such as English Chinese Translation Treebank v 1.0 and English-Arabic Treebank v 1.0.  In these corpora, the Chinese or Arabic source texts have been translated into English, then POS-tagged and treebanked, thus making them suitable for machine translation work as well.  Additional translation treebanks are planned for release and will feature cleaner translation and contain substantially more data. 

Corpora which combine treebanking with another type of annotation include the English Conversational Telephone Speech Treebank with Structural Metadata, to be released later this year.  This treebank is annotated for structural metadata including fillers, disfluencies and sentence/semantic units, and also tagged for syntactic structure, and so, evaluates the impact of metadata extraction (MDE) on parsing information.  While these newer releases are smaller than the Penn Treebanks, the improved Treebank II annotation-style has a very high rate of inter-annotator agreement..  Additionally, the source texts are more varied in both domain and style than the WSJ texts that constitute the bulk of Penn Treebank.

Got a question?  About LDC data?  Forward it to ldc@ldc.upenn.edu.  The answer may appear in a future Membership Mailbag article.


New Publications


(1) Chinese Proposition Bank 2.0 (CPB2.0) is a continuation of the Chinese Proposition Bank project, which aims to create a corpus of Chinese text annotated with information about basic semantic propositions. Chinese Proposition Bank 1.0 consists of predicate-argument annotation on 250,000 words from Chinese Treebank 5.0. Chinese Proposition Bank 2.0 adds predicate-argument annotation on 500,000 words from Chinese Treebank 6.0. The data sources include newswire from Xinhua News Agency, articles from Sinorama Magazine, news from the website of the Hong Kong Special Administrative Region and transcripts from various Chinese broadcast news programs.

This release contains the predicate-argument annotation of 81,009 verb instances (11,171 unique verbs) and 14,525 noun instances (1,421 unique nouns). The annotation of nouns is limited to nominalizations that have a corresponding verb. The general annotation guidelines and the lexical guidelines (called frame files) for each verbal and nominal predicate are included in this release.  Chinese Proposition Bank 2.0 is distributed via web download.

2008 Subscription Members will automatically receive two copies of this corpus on disc. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$850.

*

(2)  Hindi WordNet was developed by researchers at the Center for Indian Language Technology, Computer Science and Engineering Department, IIT Bombay.  Wordnets are systems for analyzing the different lexical and semantic relations between words. Specifically, a wordnet is a word sense network in which words are grouped into semantically equivalent units called synsets. Each synset represents a lexical concept, and synsets are linked to each other by semantic relations (between synsets) and lexical relations (between words). Similar in design to the Princeton Wordnet for English, Hindi Wordnet incorporates additional features to capture the complexities of Hindi. This release of Hindi Wordnet consists of 56,928 unique words and 26,208 synsets.

Additional information about the development of Hindi Wordnet is available at the Hindi WordNet web site.

Hindi WordNet contains nouns, verbs, adjectives and adverbs. Each entry consists of the following elements:

1.      Synset: a set of synonymous words. The words in the synset are arranged according to the frequency of usage.

2.      Gloss: the concept. It consists of two parts:

Text definition: explains the concept denoted by the synset. 

Example sentence: gives the usage of the words in the sentence.

3.      Position in Ontology: An ontology is a hierarchical organization of concepts, or more specifically, a categorization of entities and actions. A separate ontological hierarchy exists for each syntactic category (noun, verb, adjective adverb). Each synset is mapped into some place in the ontology..

This release of Hindi WordNet is made available as a complete Java application along with an API to facilitate further development.  Hindi WordNet is distributed via web download. 

2008 Subscription Members will automatically receive two copies of this corpus on disc, provided that they have submitted a signed copy of the User License Agreement for Hindi WordNet (LDC2008L02).  2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$300.

*

(3) West Point Brazilian Portuguese Speech is a database of digital recordings of spoken Brazilian Portuguese designed and collected by staff and faculty of the Department of Foreign Languages (DFL) and Center for Technology Enhanced Language Learning (CTELL) to develop acoustic models for speech recognition systems. The U.S. government uses such systems to provide speech-recognition enhanced language learning course ware to government linguists and students enrolled in various government language programs.

The data in this corpus was collected in March 1999 in Brasilia, Brazil using informants from a Brazilian military academy. The corpus consists of read speech from 60 female and 68 male native and non-native speakers.  The speech was elicited from a prompt script containing 296 sentences and phrases typically used in language learning situations.

The speech was collected using four laptop computers running MS Windows. Three of the computers recorded with a 16 bit data size and sampling rate of 22050 Hz, the other laptop recorded with an 8 bit data size at a sampling rate of 11025 Hz. The recording script presented a visual display of the sentence to be recorded. The informant pressed a key and spoke the sentence. The recording was played back for review, allowing the utterance to be re-recorded. West Point Brazilian Portuguese Speech is distributed on one DVD-ROM.

2008 Subscription Members will automatically receive two copies of this corpus. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$500. l
Back to Top

5-3 . Question Answering on speech transcripts (QAst)

The QAst organizers are pleased to announce the release of the development dataset for
the CLEF-QA 2008 track "Question Answering on Speech Transcripts" (QAst).
We take this opportunity to launch a first call for participation in
this evaluation exercise.

QAst is a CLEF-QA track that aims at providing an evaluation framework
for QA technology on speech transcripts, both manual and automatic.
A detailed description of this track is available at:
http://www.lsi.upc.edu/~qast <http://www.lsi.upc.edu/~qast>

It is the second evaluation for the QAst track.
Last year (QAst 2007), factual questions had been generated for two
distinct corpora (in English language only). This year, in addition to
factual questions,
some definition questions are generated, and five corpora covering three
different languages are used (3 corpora in English, 1 in Spanish and 1
in French).

Important dates:

# 15 June 2008: evaluation set released
# 30 June 2008: submission deadline

The pilot track is organized jointly by the Technical University of
Catalonia (UPC), the Evaluations and Language resources Distribution
Agency (ELDA) and Laboratoire d'Informatique pour la Mécanique et les
Sciences de l'Ingénieur (LIMSI).

If you are interested in participating please send an email to Jordi
Turmo (turmo_AT_lsi.upc.edu) with "QAst" in the subject line.

Back to Top

5-4 . ELRA- Language Resources Catalogue-Update

ELRA is happy to announce that 1 new Speech Resource, produced within
the Technolangue programme, is now available in its catalogue.
*ELRA-S0272 MEDIA speech database for French
*The MEDIA speech database for French was produced by ELDA within the
French national project MEDIA (Automatic evaluation of man-machine
dialogue systems), as part of the Technolangue programme funded by the
French Ministry of Research and New Technologies (MRNT). It contains
1,258 transcribed dialogues from 250 adult speakers. The method chosen
for the corpus construction process is that of a =91Wizard of Oz=92 (WoZ)
 
system. This consists of simulating a natural language man-machine
dialogue. The scenario was built in the domain of tourism and hotel
reservation.
The semantic annotation of the corpus is available in this catalogue and
referenced ELRA-E0024 (MEDIA Evaluation Package).
For more information, see:=20
http://catalog.elra.info/product_info.php?products_id=3D1057
 
For more information on the catalogue, please contact Val=E9rie Mapelli
mailto:mapelli@elda.org
 
Visit our on-line catalogue: http://catalog.elra.info
<http://catalog.elra.info/>.
 
Back to Top

5-5 . MusicSpeech group

Music and speech share numerous aspects (language, structural, acoustics, cognitive), as long in their production, that in their representation and their perception. This list has for object to warn its users, various events dealing with the study of the links between music and speech. It thus intends to connect several communities, their allowing each to take advantage of a stimulating interaction.

As a member of the speech or music community, you are invited to
subscribe to musicspeech group. The group will be moderated and
maintained by IRCAM.

Group details:
* Name: musicspeech
* Home page: http://listes.ircam.fr/wws/info/musicspeech
* Email address: musicspeech@ircam.fr

Greg Beller, IRCAM,
moderator, musicspeech list

Back to Top

6 . Jobs openings

Back to Top

6-1 . ATT - Labs Research: Research Staff Positions - Florham Park, NJ

ATT - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are               

  • Large-vocabulary automatic speech recognition
  • Acoustic and language modeling
  • Robust speech recognition
  • Signal processing
  • Speaker recognition
  • Speech data mining
  • Natural language understanding and dialog
  • Text and web mining
  • Voice and multimodal search

AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).

 

Back to Top

6-2 . Summer Intern positions at Motorola Schaumburg Illinois USA

Motorola Labs - Center for Human Interaction Research (CHIR) located in Schaumburg Illinois, USA, is offering summer intern positions in 2008 (12 weeks each).

CHIR's mission:

Our research lab develops technologies that provide access to rich communication, media and information services effortless, based on natural, intelligent interaction. Our research aims on systems that adapt automatically and proactively to changing environments, device capabilities and to continually evolving knowledge about the user.

Intern profiles:

1) Acoustic environment/event detection and classification.

Successful candidate will be a PhD student near the end of his/her PhD study and is skilled in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. Candidates with knowledge of acoustic environment/event classification are preferred.

2) Speaker adaptation for applications on speech recognition and spoken document retrieval.

The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete understanding and hand-on experience on automatic speech recognition related research. Proficiency in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker adaptation is highly preferred.

3) Development of voice search-based web applications on a smartphone

We are looking for an intern candidate to help create an "experience" prototype based on our voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and rich interaction with web resources. This intern project is oriented more towards software engineering than research. We target an intern with a master's degree and strong software engineering background. Mastery of C++ and experience with web programming (AJAX and web services) is required. Development experience on Windows CE/Mobile desired.

4) Integrated Voice Search Technology For Mobile Devices.

Candidate should be proficient in information retrieval, pattern recognition and speech recognition. Candidate should program in C++ and script languages such as Python or Perl in Linux environment. Also, he/she should have knowledge on information retrieval or search engines.

We offer competitive compensation, fun-to-work environment and Chicago-style pizza.

If you are interested, please send your resume to:

Dusan Macho, CHIR-Motorola Labs

Email: dusan.macho@motorola.com

Tel: +1-847-576-6762

Back to Top

6-3 . Nuance: Software engineer speech dialog tools

In order to strengthen our Embedded ASR Research team, we are looking for a:

SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

OVERVIEW:

- You will work in Nuance's Embedded ASR R&D team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.

- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.

- You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.

- In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.

QUALIFICATIONS:

- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.

- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

- You have experience with Python or other scripting languages.

- GUI programming experience is a strong asset.

The following skills are a plus:

- Understanding of communication protocols

- Understanding of databases

- Understanding of computational agents and related frameworks (such as OAA).

- A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics and machine learning, especially as related to natural language processing, dialogue, and representation of information

- You can work both as a team player and as goal-oriented independent software engineer.

- You can work in a multi-national team and communicate effectively with people of different cultures.

- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

- You are fluent in English and you can write high quality documentation.

- Knowledge of other languages is a plus.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

Deanna Roe                 Deanna.roe@nuance.com

Please make sure to document to us your excellent software engineering skills.

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

6-4 . Nuance: Speech scientist London UK

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

To strengthen our International Professional Services team, based in London, we are currently looking for a

 

 

                            Speech Scientist, London, UK

Nuance Professional Services (PS) has designed, developed, and optimized thousands of speech systems across dozens of industries, including directory search, call center automation, applications in telecom, finance, airline, healthcare, and other verticals; applications for video games, mobile dictation, enhanced search services, SMS, and in-car navigation.  Nuance PS applications have automated approximately 7 billion phone conversations for some of the world's most respected companies, including British Airways, Vodafone, Amtrak, Bank of America, BellCanada, Citigroup, General Electric, NTT and Verizon.

The PS organization consists of energetic, motivated, and friendly individuals.  The Speech Scientists in PS are among the best and brightest, with PhDs from universities such as Cambridge (UK), MIT, McGill, Harvard, Penn, CMU, and Georgia Tech, and having worked at research labs such Bell Labs, Motorola Labs, and ATR (Japan), culminating in over 300 years of Speech Science experience and covering well over 20 languages.

Come and join Nuance PS and work on the latest technology from one of the prominent speech recognition technology providers, and make a difference in the way the world communicates.

Job Overview

As a Speech Scientist in the Professional Services group, you will work on automated speech recognition applications, covering a broad range of activities in all project phases, including the design, development, and optimization of the system.  You will:

  • Work across application development teams to ensure best possible recognition performance in deployed systems
  • Identify recognition challenges and assess accuracy feasibility during the design phase,
  • Design, develop, and test VoiceXML grammars and create JSPs, Java, and ECMAscript grammars for dynamic contexts
  • Optimize accuracy of applications by analyzing performance and tuning statistical language models, pronunciations, and acoustic models, including identifying areas for improvement by running the recognizer offline
  • Contribute to the generation and presentation of client-facing reports
  • Act as technical lead on more intensive client projects
  • Develop methodologies, scripts, procedures that improve efficiency and quality
  • Develop tools and enhance algorithms that facilitate deployment and tuning of recognition components
  • Act as subject matter domain expert for specific knowledge domains
  • Provide input into the design of future product releases

     Required Skills

  • MS or PhD in Computer Science, Engineering, Computational Linguistics, Physics, Mathematics, or related field (or equivalent)
  • Strong analytical and problem solving skills and ability to troubleshoot issues
  • Good judgment and quick-thinking
  • Strong programming skills, preferably Perl or Python
  • Excellent written and verbal communications skills
  • Ability to scope work taking technical, business and time-frame constraints into consideration
  • Works well in a team and in a fast-paced environment

Beneficial Skills

  • Strong programming skills in either Perl, Python, Java, C/C++, or Matlab
  • Speech recognition knowledge
  • Strong pattern recognition, linguistics, signal processing, or acoustics knowledge
  • Statistical data analysis
  • Experience with XML, VoiceXML, and Wiki
  • Ability to mentor or supervise others
  • Additional language skills, eg French, Dutch, German, Spanish

 

Back to Top

6-5 . Nuance: Research engineer speech engine

In order to strengthen our Embedded ASR Research team, we are looking for a:

RESEARCH ENGINEER SPEECH ENGINE

As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

 OVERVIEW:

- You will work in Nuance's Embedded ASR R&D team, developing, improving and maintaining core ASR engine algorithms for our customers in the Automotive and Mobile sector.

- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the developing, improving and maintaining core ASR engine algorithms for cutting edge speech and natural language understanding technologies for automotive and mobile devices.

- You will work on the design and development of more efficient, flexible ASR search algorithms with high focus on low memory and processor requirements.

QUALIFICATIONS:

- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field. PhD is a plus.

- A background in (computational) linguistics, speech processing, ASR search, confidence values, grammars, statistics and machine learning, especially as related to natural language processing.

- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

The following skills are a plus:

- You have experience with Python or other scripting languages.

- Broad knowledge about architectures of embedded platforms and processors.

- Understanding of databases

- You can work both as a team player and as goal-oriented independent software engineer.

- You can work in a multi-national team and communicate effectively with people of different cultures.

- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

- You are fluent in English and you can write high quality documentation.

- Knowledge of other languages is a plus.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

Deanna Roe                  Deanna.roe@nuance.com

Please make sure to document to us your excellent software engineering skills.

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

6-6 . Nuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:

In order to strengthen our Embedded ASR Research team, we are looking for a:

   RESEARCH ENGINEER SPEECH DIALOGUE SYSTEMS

As part of our team, you will be creating speech technologies for embedded applications varying from simple command and control tasks up to natural language speech dialogues on mobile and automotive platforms.

OVERVIEW:

-You will work in Nuance's Embedded ASR research and production team, creating technology, tools and runtime software to enable our customers develop embedded speech applications. In our team of speech and language experts, you will work on natural language dialogue systems that define the state of the art.

- You will work at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the picturesque city of Ghent, in the Flanders region of Belgium. Ghent has one of the most spectacular historic town centers of Europe and is known for its unique vibrant yet cozy charm, and is home to a large international university.

- You will work in an international company and cooperate with people on various locations including in Europe, America, and Asia.  You may occasionally be asked to travel.

RESPONSIBILITIES:

- You will work on the development of cutting edge natural language dialogue and speech recognition technologies for automotive embedded systems and mobile devices.

- You will design, implement, evaluate, optimize, and test new algorithms and tools for our speech recognition systems, both for research prototypes and deployed products, including all aspects of dialogue systems design, such as architecture, natural language understanding, dialogue modeling, statistical framework, and so forth.

- You will help the engine process multi-lingual natural and spontaneous speech in various noise conditions, given the challenging memory and processing power constraints of the embedded world.

QUALIFICATIONS:

- You have a university degree in computer science, (computational) linguistics, engineering, mathematics, physics, or a related field. A graduate degree is an asset.

-You have strong software and programming skills, especially in C/C++, ideally for embedded applications. Knowledge of Python or other scripting languages is a plus. [HQ1] 

- You have experience in one or more of the following fields:

     dialogue systems

     applied (computational) linguistics

     natural language understanding

     language generation

     search engines

     speech recognition

     grammars and parsing techniques.

     statistics and machine learning techniques

     XML processing

-You are a team player, willing to take initiative and assume responsibility for your tasks, and are goal-oriented.

-You can work in a multi-national team and communicate effectively with people of different cultures.

-You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

-You are fluent in English and you can write high quality documentation.

-Knowledge of other languages is a strong asset.

CONTACT:

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

 

Deanna Roe                  Deanna.roe@nuance.com

ABOUT US:

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

6-7 . Research Position in Speech Processing at Nagoya Institute of Technology,Japan

Nagoya Institute of Technology is seeking a researcher for a

post-doctoral position in a new European Commission-funded project

EMIME ("Efficient multilingual interaction in mobile environment")

involving Nagoya Institute of Technology and other five European

partners, starting in March 2008 (see the project summary below).

The earliest starting date of the position is March 2007. The initial

duration of the contract will be one year, with a possibility for

prolongation (year-by-year basis, maximum of three years). The

position provides opportunities to collaborate with other researchers

in a variety of national and international projects. The competitive

salary is calculated according to qualifications based on NIT scales.

The candidate should have a strong background in speech signal

processing and some experience with speech synthesis and recognition.

Desired skills include familiarity with latest spectrum of technology

including HTK, HTS, and Festival at the source code level.

For more information, please contact Keiichi Tokuda

(http://www.sp.nitech.ac.jp/~tokuda/).

About us

Nagoya Institute of Technology (NIT), founded on 1905, is situated in

the world-quality manufacturing area of Central Japan (about one hour

and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).

NIT is a highest-level educational institution of technology and is

one of the leaders of such institutions in Japan. EMIME will be

carried at the Speech Processing Laboratory (SPL) in the Department of

Computer Science and Engineering of NIT. SPL is known for its

outstanding, continuous contribution of developing high-performance,

high-quality opensource software: the HMM-based Speech Synthesis

System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary

continuous speech recognition engine "Julius"

(http://julius.sourceforge.jp/), and the Speech Signal Processing

Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is

involved in numerous national and international collaborative

projects. SPL also has close partnerships with many industrial

companies, in order to transfer its research into commercial

applications, including Toyota, Nissan, Panasonic, Brother Inc.,

Funai, Asahi-Kasei, ATR.

Project summary of EMIME

The EMIME project will help to overcome the language barrier by

developing a mobile device that performs personalized speech-to-speech

translation, such that a user's spoken input in one language is used

to produce spoken output in another language, while continuing to

sound like the user's voice. Personalization of systems for

cross-lingual spoken communication is an important, but little

explored, topic. It is essential for providing more natural

interaction and making the computing device a less obtrusive element

when assisting human-human interactions.

We will build on recent developments in speech synthesis using hidden

Markov models, which is the same technology used for automatic speech

recognition. Using a common statistical modeling framework for

automatic speech recognition and speech synthesis will enable the use

of common techniques for adaptation and multilinguality.

Significant progress will be made towards a unified approach for

speech recognition and speech synthesis: this is a very powerful

concept, and will open up many new areas of research. In this

project, we will explore the use of speaker adaptation across

languages so that, by performing automatic speech recognition, we can

learn the characteristics of an individual speaker, and then use those

characteristics when producing output speech in another language.

Our objectives are to:

1. Personalize speech processing systems by learning individual

characteristics of a user's speech and reproducing them in

synthesized speech.

2. Introduce a cross-lingual capability such that personal

characteristics can be reproduced in a second language not spoken

by the user.

3. Develop and better understand the mathematical and theoretical

relationship between speech recognition and synthesis.

4. Eliminate the need for human intervention in the process of

cross-lingual personalization.

5. Evaluate our research against state-of-the art techniques and in a

practical mobile application.

 

Back to Top

6-8 . C/C++ Programmer Munich, Germany

Digital publishing AG is one of Europe's leading producers of  interactive software for foreign language training. In our e- learning courses we want to place the emphasis on speaking and  spoken language understanding.  In order to strengthen our Research & Development Team in Munich,  Germany, we are looking for experienced C or C++ programmers with  at least 3 years experience in the design and coding of  sophisticated software systems under Windows.   
We offer   
-a creative working atmosphere in an international team of   software engineers, linguists and editors working on    challenging research projects in speech recognition and    speech dialogue systems  
- participation in all phases of a product life cycle, as we    are interested in the fast transfer of research results    into products.  
- the possibility to participate in international scientific    conferences.   
- a permanent job in the center of Munich.  
- excellent possibilities for development within our fast    growing company.    
- flexible working times, competitive compensation and    arguably the best espresso in Munich.   
We expect  
-several years of practical experience in software    development in C or C++ in a commercial or academic    environment.  
-experience with parallel algorithms and thread    programming.  
-experience with object-oriented design of software    systems.  
-good knowledge of English or German.   
Desirable is  
-experience with optimization of algorithms.  
-experience in statistical speech or language    processing, preferably speech recognition, speech    synthesis, speech dialogue systems or chatbots.  
-experience with Delphi or Turbo Pascal.   
Interested? We look forward to your application:  (preferably by e-mail)   
digital publishing AG  
Freddy Ertl  f.ertl@digitalpublishing.de  
Tumblinger Straße 32  
D-80337 München Germany
Back to Top

6-9 . Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
  • You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
  • You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.

Requirements:

  • Solid background in speech recognition, natural language processing, machine learning and information extraction.
  • 2+ years of experience participating in software development projects
  • Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
  • Excellent analytical and problem-solving skills
  • Integrate and communicate well in small R&D teams
  • Masters degree in CS or related engineering fields
  • Experience in a healthcare-related field a plus

 

In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

6-10 . Senior Research Scientist -- Speech and Natural Language Processing at M*Modal, Pittsburgh, PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • Plan and perform research and development tasks to continuously improve a state-of-the-art speech understanding system
  • Take a leading role in identifying solutions to challenging technical problems
  • Contribute original ideas and turn them into product-grade software implementations
  • Collaborate with other members of the R&D team to identify, analyze and resolve technical issues

Requirements:

  • Solid research & development background with 3+ years of experience in speech recognition research, covering at least two of the following topics: speech processing, acoustic modeling, language modeling, decoding, LVCSR, natural language processing/understanding, speaker verification/identification, audio mining
  • Working knowledge of Machine Learning, Information Extraction and Natural Language Processing algorithms
  • 3+ years of experience participating in large-scale software development projects using C++ and Java.
  • Excellent analytical, problem-solving and communication skills
  • PhD with focus on speech recognition or Masters degree with 3+ years industry experience working on automatic speech recognition
  • Experience and/or education in medical informatics a plus
  • Working experience in a healthcare related field a plus

 


In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

6-11 . Postdoc position at LORIA, Nancy, France

Building an articulatory model from ultrasound, EMA and MRI data

Postdoctoral position

 

 

Research project

An articulatory model comprises both the visible and the internal mobile articulators which are involved in speech articulation: the lower jaw, tongue, lips and velum) as well as the fixed walls (the palate, the rear wall of the pharynx). An articulatory model is dynamic since the articulators deform during speech production. Such a model has a potential interest in the field of language learning by providing visual feedback on the articulation conducted by the learner, and many other applications.

Building an articulatory model is difficult because the different articulators have to be detected from specific image modalities: the lips are acquired through video, the tongue shape is acquired through ultrasound imaging with a high frame rate but these 2D images are very noisy. Finally, 3D images of all articulators can be obtained with MRI but only for sustained sounds (as vowels) due to the long acquisition time of MRI images.

The subject of this post-doc is to construct a dynamic 3D model of the entire vocal tract by merging the 3D information available in the MRI acquisitions and temporal 2D information provided by the contours of the tongue visible on the ultrasound images or X-ray images.

We are working on the construction of an articulatory model within the European project ASPI (http://aspi.loria.fr/ ).

We already built an acquisition system which allows us to obtain synchronized data from ultrasound, MRI, video and EM modalities.

Only a few complete articulatory models are currently available in the world and a real challenge in the field is to design set-ups and easy-to-use methods for automatically building the model of any speaker from 3D and 2D images. Indeed, the existence of more articulatory models would open new directions of research about speaker variability and speech production.

 

Objectives

The aim of the subject is to build a deformable model of the vocal tract from static 3D MRI images and 2D dynamic 2D sequences. Previous works have been conducted on the modelling of the vocal tract, and especially of the tongue (M. Stone[1] O. Engwall[2]). Unfortunately, important human interaction is required to extract tongue contours in the images. In addition, only one image modality is often considered in these works, thus reducing the reliability of the model obtained.

The aim of this work is to provide automatic methods for segmenting features in the images as well as methods for building a parametric model of the 3D vocal tract with these specific aims:

  • The segmentation process is to be guided by prior knowledge on the vocal tract. In particular shape, topologic as well as regularity constraints must be considered.
  • A parametric model of the vocal tract has to be defined (classical models are linear and built from a principal component analysis). Special emphasis must be put on the problem of matching the various features between the images.
  • Besides classical geometric constraints, both the building and the assessment of the model will be guided by acoustic distances in order to check for the adequation between the sound synthesized from the model and the sound realized by the human speaker.

 

Skill and profile

The recruited person must have a solid background in computer vision and in applied mathematics. Informations and demonstrations on the research topics addressed by the Magrit team are available at http://magrit.loria.fr/  

 

References

[1] M. Stone : Modeling tongue surface contours from Cine-MRI images. Journal of Speech, language, hearing research, 2001.

[2]:P. Badin, G. Bailly, L. Reveret: Three-dimensional linear articulatory modeling of tongue, lips and face based on MRI and video images, Journal of Phonetics, 2002, vol 30, p 533-553

 

Contact

Interested candidates are invited to contact Marie-Odile Berger, berger@loria.fr, +33 3 54 95 85 01

 

Important information

This position is advertised in the framework of the national INRIA campaign for recruiting post-docs. It is a one year position, renewable, beginning fall 2008. The salary is 2,320€ gross per month. 

 

Selection of candidates will be a two step process. A first selection for a candidate will be carried out internally by the Magrit group. The selected candidate application will then be further processed for approval and funding by an INRIA committee.

 

Doctoral thesis less than one year old (May 2007) or being defended before end of 2008. If defence has not taken place yet, candidates must specify the tentative date and jury for the defence.

 

Important - Useful links

Presentation of INRIA postdoctoral positions

To apply (be patient, loading this link takes times...)

 

Back to Top

6-12 . Internships at Motorola Labs Schaumburg

Motorola Labs - Center for Human Interaction Research (CHIR) 
located in Schaumburg Illinois, USA, 
is offering summer intern positions in 2008 (12 weeks each). 
CHIR's mission
 
Our research lab develops technologies that provide access to rich communication, media and 
information services effortless, based on natural, intelligent interaction. Our research 
aims on systems that adapt automatically and proactively to changing environments, device 
capabilities and to continually evolving knowledge about the user.
 
Intern profiles
 
1) Acoustic environment/event detection and classification. 
Successful candidate will be a PhD student near the end of his/her PhD study and is skilled 
in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. 
Candidates with knowledge of acoustic environment/event classification are preferred. 
 
2) Speaker adaptation for applications on speech recognition and spoken document retrieval
The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete 
understanding and hand-on experience on automatic speech recognition related research. Proficiency 
in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker 
adaptation is highly preferred.
 
3) Development of voice search-based web applications on a smartphone 
We are looking for an intern candidate to help create an "experience" prototype based on our 
voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and 
rich interaction with web resources. This intern project is oriented more towards software engineering 
than research. We target an intern with a master's degree and strong software engineering background. 
Mastery of C++ and experience with web programming (AJAX and web services) is required. 
Development experience on Windows CE/Mobile desired.
 
4) Integrated Voice Search Technology For Mobile Devices
Candidate should be proficient in information retrie