ISCApad

  • Editorial
  • ISCA News
  • SIG's activities
  • Courses, Internships
  • Books, Databases, Softwares
  • Job openings
  • Journals
  • Future Conferences
  • Future Interspeech conferences
  • Future ISCA Technical and Research Workshops
  • Forthcoming events supported (but not organized) by ISCA
  • Future Speech Science and Technology Events
Number: 116 Date: 02/06/2008
Author: Chris Wellekens

Editor: Chris Wellekens

 

 Dear Members,

 In this issue, you will find many new job offers and new conference announcements. Please  pay a special  attention to the upcoming deadlines for submission.

I remind you that ISCApad is published monthly. In consequence, it is difficult to inform members of last minute extensions of a submision deadlines.

May I ask those of you who send me job openings to specify if possible the expiration date for the offer ,or,  if the offer remains valid until the position is filled, to inform me as soon it is filled. I try to keep only actual offers in the job listings.

Professor em. Chris Wellekens

Institut Eurecom France

 

 

 

 

 

 

 

Back to Top

ISCA News

  • GOOGLE SCHOLAR AND ISCA ARCHIVE


    Google Scholar and the ISCA Archive

    The indexing of the ISCA Archive (http://www.isca-speech.org/archive/) by the Google Scholar search engine (http://scholar.google.com/) is now thorough enough to be quite useful, so this seems like a good time to give an overview of the service.  Google Scholar is a research literature search engine that provides full-text search for ISCA papers whose full text cannot be searched with other search engines. Google Scholar's citation tracking shows what papers have cited a particular paper, which can be very useful for finding follow-up work, related work and corrections.  More details about these and other features are given below.

    The titles, author lists, and abstracts of ISCA Archive papers are all on the public web, so they can be searched by a general-purpose search engine such as Google.  However, the full texts of most ISCA papers are password protected and thus cannot be searched with a general-purpose search engine.  Google Scholar, through an arrangement with ISCA, has access to the full text of ISCA papers. Google Scholar has similar arrangements with many other publishers.  (On the other hand, general-purpose search engines index all sorts of web pages and other documents accessible through the public web, many of which will not be in the Google Scholar index.  So it's often useful to perform the same search using both Google Scholar and a general-purpose search engine.)

    Google Scholar automatically extracts citations from the full text of papers. It uses this information to provide a "Cited by" list for each paper in the Google Scholar index.  This is a list of papers that have cited that paper. Google Scholar also provides an automatically generated "Related Articles" list for each paper.  The "Cited by" and "Related Articles" lists are powerful tools for discovering relevant papers.  Furthermore, the length of a paper's "Cited by" list can be used as a convenient (although imperfect) measure of the paper's impact.  Discussions about the subtleties of using Google Scholar to measure impact can be found at http://www.harzing.com/resources.htm#/pop_gs.htm and http://blogs.nature.com/nautilus/2007/07/google_scholar_as_a_measure_of.html.

    It's possible to restrict Google Scholar searches to papers published by ISCA by using Google Scholar's Advanced Search feature and entering "ISCA" in the "Return articles published in" field.  If "ISCA" is entered in that field, and nothing is entered in the main search field, then the search results will show what ISCA papers are the most highly cited.

    It should be noted that that there are many papers on ISCA-related topics which are not in the Google Scholar index.  For example, it seems many ICPhS papers are missing.  And old papers which have been scanned in from paper copies will either not have their full contents indexed, or will be indexed using imperfect OCR technology. Furthermore, as of November 2007 the indexing of the ISCA Archive by Google Scholar is still not 100% complete.  There are a few different areas which are not perfectly indexed, but the biggest planned improvement is to start using OCR for the ISCA papers which have been scanned in from paper copies.

    There may be a time lag between when a new event is added to the ISCA Archive in the future and when it appears in the Google Scholar index. This time lag may be longer than the usual lag of general-purpose search engines such as Google, because ISCA must create Google Scholar catalog data for every new event and because the Google Scholar index seems to update considerably more slowly than the Google index.

    Acknowledgements: ISCA's arrangement with Google Scholar is a project of students Rahul Chitturi, Tiago Falk, David Gelbart, Agustin Gravano, and Francis Tyers, ISCA webmaster Matt Bridger, and ISCA Archive coordinator Wolfgang Hess.  Our thanks to Google's Christian DiCarlo and Darcy Dapra, and the rest of the Google Scholar team.

    Back to Top

SIG's activities

  •  

    A list of Speech Interest Groups can be found on our web.

    Back to Top

Courses, Internships

  • Motorola Labs - Center for Human Interaction Research (CHIR) l

    Motorola Labs - Center for Human Interaction Research (CHIR) 
    located in Schaumburg Illinois, USA, 
    is offering summer intern positions in 2008 (12 weeks each). 
     
    CHIR's mission
     
    Our research lab develops technologies that provide access to rich communication, media and 
    information services effortless, based on natural, intelligent interaction. Our research 
    aims on systems that adapt automatically and proactively to changing environments, device 
    capabilities and to continually evolving knowledge about the user.
     
    Intern profiles
     
    1) Acoustic environment/event detection and classification. 
    Successful candidate will be a PhD student near the end of his/her PhD study and is skilled 
    in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. 
    Candidates with knowledge of acoustic environment/event classification are preferred. 
     
    2) Speaker adaptation for applications on speech recognition and spoken document retrieval
    The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete 
    understanding and hand-on experience on automatic speech recognition related research. Proficiency 
    in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker 
    adaptation is highly preferred.
     
    3) Development of voice search-based web applications on a smartphone 
    We are looking for an intern candidate to help create an "experience" prototype based on our 
    voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and 
    rich interaction with web resources. This intern project is oriented more towards software engineering 
    than research. We target an intern with a master's degree and strong software engineering background. 
    Mastery of C++ and experience with web programming (AJAX and web services) is required. 
    Development experience on Windows CE/Mobile desired.
     
    4) Integrated Voice Search Technology For Mobile Devices
    Candidate should be proficient in information retrieval, pattern recognition and speech recognition. 
    Candidate should program in C++ and script languages such as Python or Perl in Linux environment. 
    Also, he/she should have knowledge on information retrieval or search engines.
     
    We offer competitive compensation, fun-to-work environment and Chicago-style pizza.
     
    If you are interested, please send your resume to:
     
    Dusan Macho, CHIR-Motorola Labs
    Email: dusan [dot] macho [at] motorola [dot] com
    Tel: +1-847-576-6762

    Back to Top
  • Studentships in Human Language Technology

             *** Studentships available for 2008/9 *** 

                       One-Year Masters Course in  HUMAN LANGUAGE TECHNOLOGY 
                                             Department of Computer Science                
                                               The University of Sheffield - UK  
    The Sheffield MSc in Human Language Technology (HLT) has been carefully tailored 
    to meet the demand for graduates with the highly-specialised multi-disciplinary skills 
    that are required in HLT, both as practitioners in the development of HLT applications 
    and as researchers into the advanced capabilities required for next-generation HLT 
    systems.  The course provides a balanced programme of instruction across a range 
    of relevant disciplines including speech technology, natural language processing and 
    dialogue systems.  The programme is taught in a research-led environment.  
    This means that you will study the most advanced theories and techniques in the field, 
    and have the opportunity to use state-of-the-art software tools.  You will also have 
    opportunities to engage in research-level activity through in-depth exploration of 
    chosen topics and through your dissertation.  As well as readying yourself for 
    employment in the HLT industry, this course is also an excellent introduction to the 
    substantial research opportunities for doctoral-level study in HLT.  
    ***  A number of studentships are available, on a competitive basis, to suitably 
    qualified applicants.  These awards pay a stipend in addition to the course fees.  
    ***  For further details of the course, 
    see ... http://www.shef.ac.uk/dcs/postgrad/taught/hlt  
    For information on how to apply 
    see ... http://www.shef.ac.uk/dcs/postgrad/taught/apply.html 

    Back to Top

Books, Databases, Softwares

  • Reviewing a book?

    The author of the book Advances in Digital Speech Transmission told me that you might be interested in doing a review of her book. If so I would be pleased to send you a free review copy. Please just answer to this email and let me know the address where I can send to book to.

    Martin, Rainer / Heute, Ulrich / Antweiler, Christiane
    Advances in Digital Speech Transmission

    1. Edition - January 2008
    99.90 Euro
    2008. 572 Pages, Hardcover
    - Practical Approach Book -
    ISBN-10: 0-470-51739-5
    ISBN-13: 978-0-470-51739-0 - John Wiley & Sons

    Best regards

    Tina Heuberger
    ----------------------------------------------------
    Public Relations Associate
    Physical Sciences and Life Sciences Books
    Wiley-Blackwell
    Wiley-VCH Verlag GmbH & Co. KGaA
    Boschstr. 12
    69469 Weinheim
    Germany
    phone +49/6201/606-412
    fax +49/6201/606-223
    mailto:theuberger@wiley-vch.de

    Back to Top
  • Books


    La production de la parole
    Author: Alain Marchal, Universite d'Aix en Provence, France
    Publisher: Hermes Lavoisier
    Year: 2007

    Speech enhancement-Theory and Practice
    Author: Philipos C. Loizou, University of Texas, Dallas, USA
    Publisher: CRC Press
    Year:2007

    Speech and Language Engineering
    Editor: Martin Rajman
    Publisher: EPFL Press, distributed by CRC Press
    Year: 2007

    Human Communication Disorders/ Speech therapy
    This interesting series can be listed on Wiley website

    Incurses em torno do ritmo da fala
    Author: Plinio A. Barbosa
    Publisher: Pontes Editores (city: Campinas)
    Year: 2006 (released 11/24/2006)
    (In Portuguese, abstract attached.) Website

    Speech Quality of VoIP: Assessment and Prediction
    Author: Alexander Raake
    Publisher: John Wiley & Sons, UK-Chichester, September 2006
    Website

    Self-Organization in the Evolution of Speech, Studies in the Evolution of Language
    Author: Pierre-Yves Oudeyer
    Publisher:Oxford University Press
    Website

    Speech Recognition Over Digital Channels
    Authors: Antonio M. Peinado and Jose C. Segura
    Publisher: Wiley, July 2006
    Website

    Multilingual Speech Processing
    Editors: Tanja Schultz and Katrin Kirchhoff ,
    Elsevier Academic Press, April 2006
    Website

    Reconnaissance automatique de la parole: Du signal a l'interpretation
    Authors: Jean-Paul Haton
    Christophe Cerisara
    Dominique Fohr
    Yves Laprie
    Kamel Smaili
    392 Pages
    Publisher: Dunod

    Back to Top
  • News from LDC

    -  50,000th LDC Corpus Distributed!  -

    -  LDC at the ALA Midwinter Meeting  -

    -  Survey Responses Are In!  -

    LDC2008T03
    -  ACE 2005 English SpatialML Annotations  -

    LDC2008S01
    -  CSLU: Portland Cellular Telephone Speech Version 1.3  -

    LDC2008T01
    -  Hungarian-English Parallel Text, Version 1.0  -



    50,000th LDC Corpus Distributed!

    Last year marked the LDC's 15th Anniversary Year and it proved to be an exciting one for the LDC.  We commemorated this anniversary with a Fidelity Celebration which rewarded our loyal members who continually support the consortium through membership.  Additionally, we provided our list serve readers with a glimpse into the research activities at the LDC through each of our monthly Spotlights. 

    At the very end of our anniversary year, the LDC observed another significant milestone:  the distribution of our 50,000th publication!  This corpus was licensed by Helsinki University of Technology, Adaptive Informatics Research Centre (AIRC).   AIRC's research includes basic algorithmic analysis, multimodal interfaces (speech, vision and language), bioinformatics, neuroinformatics and computational cognitive systems.  In appreciation, the LDC is offering Helsinki University of Technology a US$2000 benefit to be used towards membership or data licensing fees.

    We would like to thank both members and nonmembers for helping the LDC reach this landmark distribution. Your persistent demand for LDC data supports our mission to develop and share resources for research in human language technologies.



    LDC at the ALA Midwinter Meeting


    The LDC was delighted to attend the American Library Association's (ALA) Midwinter Conference here in Philadelphia from 11-14 January 2008 and to meet more members of our community. We demonstrated the search capabilities of the LDC Catalog and LDC Online and provided attendees with insight into our diverse publications and membership options. We would like to thank everyone who came by the LDC display at booth #239 and to invite all ALA attendees to contact us with any follow-up questions. Please read more about the Midwinter Conference on the ALA's homepage.

    Survey Responses Are In!

    The LDC is pleased to announce the results of LDC's 2007 Member Survey. We sent the survey to all those who received LDC data in 2006 and 2007 (members and nonmembers), a total of nearly 1700 recipients. The survey was customized to respondents' affiliation with the LDC (Standard, Subscription or Former Members and Non-Members) and focused on a few key issues:

    • Satisfaction levels with LDC's data, homepage and Catalog
    • Satisfaction levels with LDC Memberships (where applicable)
    • Suggestions for future data releases and publication options

    Those who responded to the survey are generally satisfied with their membership benefits and the LDC catalog and homepage. Nevertheless, some of the individuals surveyed indicated areas for improvement and we will be evaluating each response and replying to your queries within the next few weeks.

    To survey respondents: Thank you for your participation! You will be receiving a more detailed evaluation of the survey shortly along with the announcement of the lucky winner of the $500 benefit.

    New Publications

    (1)  The ACE (Automatic Contact Extraction) program focuses on developing automatic content extraction technology to support automatic processing of human language in text form. The kind of information recognized and extracted from text includes entities, values, temporal expressions, relations and events. SpatialML is a mark-up language for representing spatial expressions in natural language documents. SpatialML's focus is primarily on geography and culturally-relevant landmarks, rather than biology, cosmology, geology, or other regions of the spatial language domain. The goal is to allow for potentially better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases and mapping services. In ACE 2005 English SpatialML Annotations, the authors applied SpatialML tags to the English training data (originally annotated for entities, relations and events) in ACE 2005 Multilingual Training Corpus, LDC2006T06.

    The main SpatialML tag is the PLACE tag. The central goal of SpatialML is to map PLACE information in text to data from gazetteers and other databases to the extent possible. Therefore, semantic attributes such as country abbreviations, country subdivision and dependent area abbreviations (e.g., US states), and geo-coordinates are used to help establish such a mapping. LINK and PATH tags express relations between places, such as inclusion relations and trajectories of various kinds. To the extent possible, SpatialML leverages ISO and other standards towards the goal of making the scheme compatible with existing and future corpora. The SpatialML guidelines are compatible with existing guidelines for spatial annotation and existing corpora within the ACE research program. ACE 2005 English SpatialML Annotations is distributed via web download.

    2008 Subscription Members will automatically receive two copies of this corpus on disc. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1000.

    *

    (2)  CSLU: Portland Cellular Telephone Speech Version 1.3 was created by the Center for Spoken Language Understanding (CSLU) at OGI School of Science and Engineering, Oregon Health and Science University, Beaverton, Oregon. It consists of cellular telephone speech and corresponding transcripts, specifically, 7,571 utterances from 515 speakers who made calls in the Portland, Oregon area using cellular telephones.

    Speakers called the CSLU data collection system on cellular telephones, and they were asked to repeat certain phrases and to respond to other prompts. Two prompt protocols were used: an In Vehicle Protocol for speakers calling from inside a vehicle and a Not in Vehicle Protocol for those calling from outside a vehicle. The protocols shared several questions, but each protocol contained distinct queries designed to probe the conditions of the caller's in vehicle/not in vehicle surroundings. Not every caller provided a response to each prompt.

    The text transcriptions were produced using the non time-aligned word-level conventions described in The CSLU Labeling Guide, which is included in the documentation for this release. The corpus contains both orthographic and phonetic transcriptions of corresponding speech files. CSLU: Portland Cellular Telephone Speech Version 1.3 is distributed on one CD-ROM.

    2008 Subscription Members will automatically receive two copies of this corpus, provided that they have submitted a signed copy of the LDC User Agreement for CSLU Corpora. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$150.

     

    *

    (3)  Hungarian-English Parallel Text, Version 1.0 (also known as the "Hunglish Corpus") is a sentence-aligned Hungarian-English parallel corpus consisting of approximately two million sentence pairs. The corpus contains additional language resources for the Hungarian text, including a monolingual corpus, morphological toolset and aligner.  Hungarian-English Parallel Text, Version 1.0 is a joint work of the Media Research and Education Center at the Budapest University of Technology and Economics (BUTE) and the Corpus Linguistics Department at the Hungarian Academy of Sciences Institute of Linguistics.

    Sentence pair (.bi) files consist of tab-separated, matching sentence pairs. The .bi files do not contain segments where deletion or contraction occurred. They are also filtered based on quality, so the full reconstruction of the raw texts is impossible. Some .bi files were shuffled (sorted alphabetically).

    Alignment "ladder" (.lad) files preserve the whole of both input texts with ordering, even those segments that were not successfully aligned. In .lad files, every line is tab-separated into two columns. The first is a segment of the Hungarian text. The second is a (supposedly corresponding) segment of the English text. Such segments of the source or target text will generally consist of exactly one sentence on both sides, but can also consist of zero, or more than one, sentence. Hungarian-English Parallel Text, Version 1.0 is distributed on one CD-ROM.

    2008 Subscription Members will automatically receive two copies of this corpus, provided that they have submitted a signed copy of the User License Agreement for Hungarian-English Parallel Text, Version 1. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1000.


     

    Back to Top

Job openings

  • We invite all laboratories and industrial companies which have job offers to send them to the ISCApad editor: they will appear in the newsletter and on our website for free. (also have a look at http://www.isca-speech.org/jobs.html as well as http://www.elsnet.org/ Jobs)

    Back to Top
  • Speech Engineer/Senior Speech Engineer at Microsoft, Mountain View, CA,USA

    Job Type: Full-Time
    Send resume to Bruce Buntschuh
      Responsibilities:
    Tellme, now a subsidiary of Microsoft, is a company that is focused on delivering the highest quality voice recognition based applications while providing the highest possible automation to its clients. Central to this focus is the speech recognition accuracy and performance that is used by the applications. The candidate will be responsible for the development, performance analysis, and optimization of grammars, as well as overall speech recognition accuracy, in a wide variety of real world applications in all major market segments. This is a unique opportunity to apply and extend state of the art speech recognition technologies to emerging spaces such as information search on mobile devices.
    Requirements:
    · Strong background in engineering, linguistics, mathematics, machine learning, and or computer science.
    · In depth knowledge and expertise in the field of speech recognition.
    · Strong analytical skills with a determination to fully understand and solve complex problems.
    · Excellent spoken and written communication skills.
    · Fluency in English (Spanish a plus).
    · Programming capability with scripting tools such as Perl.
    Education:
    MS, PhD, or equivalent technical experience in an area such as engineering, linguistics, mathematics, or computer science.

    Back to Top
  • Speech Technology and Software Development Engineer at Microsoft Redmond WA, USA

      

    Speech Technology and Software Development Engineer

    Speech Technologies and Modeling

    Speech Component Group

    Microsoft Corporation

    Redmond WA, USA

    Please contact: Yifan.Gong@microsoft.com

    Microsoft's Speech Component Group has been working on automatic speech recognition (SR) in real environments. We develop SR products for multiple languages for mobile devices, desktop computers, and communication servers. The group now has an open position for speech scientists with a software development focus to work on our acoustic and language modeling technologies. The position offers great opportunities for innovation and technology and product development.

    Responsibilities:

    ·     Design and implement speech/language modeling and recognition algorithms to improve recognition accuracy.
    ·     Create, optimize and deliver quality speech recognition models and other components tailored to our customers' needs.
    ·     Identify, investigate and solve challenging problems in the areas of recognition accuracy from speech recognition system deployments.
    ·     Improve speech recognition language expansion engineering process that ensures product quality and scalability.

    Required competencies and skills:

    ·     Passion about speech technology and quality software, demonstrated ability relative to the design and implementation of speech recognition algorithms.
    ·     Strong desire for achieving excellent results, strong problem solving skills, ability to multi-task, handle ambiguities, and identify issues in complex SR systems.
    ·     Good software development skills, including strong aptitude for software design and coding. 3+ years of experience in C/C++ and programming with scripting languages are highly desirable.
    ·     MS or PhD degree in Computer Science, Electrical Engineering, Mathematics, or related disciplines, with strong background in speech recognition technology, statistical modeling, or signal processing.
    ·     Track record of developing SR algorithms, or experience in linguistic/phonetics, is a plus.

     

    Back to Top
  • PhD Research Studentship in Spoken Dialogue Systems- Cambridge UK

    Applications are invited for an EPSRC sponsored studentship in Spoken Dialogue Systems leading to the PhD degree. The student will join a team lead by Professor Steve Young working on statistical approaches to building Spoken Dialogue Systems. The overall goal of the team is to develop complete working end-to-end systems which can be trained from real data and which can be continually adapted on-line. The PhD work will focus specifically on the use of Partially Observable Markov Decision Processes for dialogue modelling and techniques for learning and adaptation within that framework. The work will involve statistical modelling, algorithm design and user evaluation. The successful candidate will have a good first degree in a relevant area. Good programming skills in C/C++ are essential and familiarity with Matlab would be useful.
    The studentship will be for 3 years starting in October 2007 or January 2008. The studentship covers University and College fees at the Home/EU rate and a maintenance allowance of 13000 pounds per annum. Potential applicants should email Steve Young with a brief CV and statement of interest in the proposed work area

    Back to Top
  • AT&T - Labs Research: Research Staff Positions - Florham Park, NJ

     

    AT&T - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are                 

    • Large-vocabulary automatic speech recognition
    • Acoustic and language modeling
    • Robust speech recognition
    • Signal processing
    • Speaker recognition
    • Speech data mining
    • Natural language understanding and dialog
    • Text and web mining
    • Voice and multimodal search

    AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).

    Back to Top
  • Research Position in Speech Processing at UGent, Belgium

      Background

    Since March 2005, the universities of Leuven, Gent, Antwerp and Brussels have joined forces in a big research project, called SPACE (SPeech Algorithms for Clinical and Educational applications). The project aims at contributing to the broader application of speech technology in educational and therapeutic software tools. More specifically, it pursues the automatic detection and classification of reading errors in the context of an automatic reading tutor, and the objective assessment of disordered speech (e.g. speech of the deaf, dysarthric speech, ...) in the context of computer assisted speech therapy assessment. Specific for the target applications is that the speech is either grammatically and lexically incorrect or a-typically pronounced. Therefore, standard technology cannot be applied as such in these applications.

    Job description

    The person we are looking for will be in charge of the data-driven development of word mispronunciation models that can predict expected reading errors in the context of a reading tutor. These models must be integrated in the linguistic model of the prompted utterance, and achieve that the speech recognizer becomes more specific in its detection and classification of presumed errors than a recognizer which is using a more traditional linguistic model with context-independent garbage and deletion arcs.  A challenge is also to make the mispronunciation model adaptive to the progress made by the user.

    Profile

    We are looking for a person from the EU with a creative mind, and with an interest in speech & language processing and machine learning. The work will require an ability to program algorithms in C and Python. Having experience with Python is not a prerequisite (someone with some software experience is expected to learn this in a short time span). Demonstrated experience with speech & language processing and/or machine learning techniques will give you an advantage over other candidates.

    The job is open to a pre-doctoral as well as a post-doctoral researcher who can start in November or December. The job runs until February 28, 2009, but a pre-doctoral candidate aiming for a doctoral degree will get opportunities to do follow-up research in related projects. 

    Interested persons should send their CV to Jean-Pierre Martens (martens@elis.ugent.be). There is no real deadline, but as soon as a suitable person is found, he/she will get the job.

    Back to Top
  • Summer Inter positions at Motorola Schaumburg Illinois USA

    Motorola Labs - Center for Human Interaction Research (CHIR) located in Schaumburg Illinois, USA, is offering summer intern positions in 2008 (12 weeks each).

    CHIR's mission:

    Our research lab develops technologies that provide access to rich communication, media and information services effortless, based on natural, intelligent interaction. Our research aims on systems that adapt automatically and proactively to changing environments, device capabilities and to continually evolving knowledge about the user.

    Intern profiles:

    1) Acoustic environment/event detection and classification.

    Successful candidate will be a PhD student near the end of his/her PhD study and is skilled in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. Candidates with knowledge of acoustic environment/event classification are preferred.

    2) Speaker adaptation for applications on speech recognition and spoken document retrieval.

    The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete understanding and hand-on experience on automatic speech recognition related research. Proficiency in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker adaptation is highly preferred.

    3) Development of voice search-based web applications on a smartphone

    We are looking for an intern candidate to help create an "experience" prototype based on our voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and rich interaction with web resources. This intern project is oriented more towards software engineering than research. We target an intern with a master's degree and strong software engineering background. Mastery of C++ and experience with web programming (AJAX and web services) is required. Development experience on Windows CE/Mobile desired.

    4) Integrated Voice Search Technology For Mobile Devices.

    Candidate should be proficient in information retrieval, pattern recognition and speech recognition. Candidate should program in C++ and script languages such as Python or Perl in Linux environment. Also, he/she should have knowledge on information retrieval or search engines.

    We offer competitive compensation, fun-to-work environment and Chicago-style pizza.

    If you are interested, please send your resume to:

    Dusan Macho, CHIR-Motorola Labs

    Email:  dusan.macho@motorola.com

    Tel: +1-847-576-6762

    Back to Top
  • Nuance: Software engineer speech dialog tools

     

    In order to strengthen our Embedded ASR Research team, we are looking for a:

    SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

    As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

    OVERVIEW:

    - You will work in Nuance's Embedded ASR R&D team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.

    - You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

    - You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.

    - You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.

    - In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.

    QUALIFICATIONS:

    - You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.

    - You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

    - You have experience with Python or other scripting languages.

    - GUI programming experience is a strong asset.

    The following skills are a plus:

    - Understanding of communication protocols

    - Understanding of databases

    - Understanding of computational agents and related frameworks (such as OAA).

    - A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics and machine learning, especially as related to natural language processing, dialogue, and representation of information

    - You can work both as a team player and as goal-oriented independent software engineer.

    - You can work in a multi-national team and communicate effectively with people of different cultures.

    - You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    - You are fluent in English and you can write high quality documentation.

    - Knowledge of other languages is a plus.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

    Deanna Roe                  Deanna.roe@nuance.com

    Please make sure to document to us your excellent software engineering skills.

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • Nuance: Speech scientist London UK

     

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

    To strengthen our International Professional Services team, based in London, we are currently looking for a

                                Speech Scientist, London, UK

    Nuance Professional Services (PS) has designed, developed, and optimized thousands of speech systems across dozens of industries, including directory search, call center automation, applications in telecom, finance, airline, healthcare, and other verticals; applications for video games, mobile dictation, enhanced search services, SMS, and in-car navigation.  Nuance PS applications have automated approximately 7 billion phone conversations for some of the world's most respected companies, including British Airways, Vodafone, Amtrak, Bank of America, BellCanada, Citigroup, General Electric, NTT and Verizon.

    The PS organization consists of energetic, motivated, and friendly individuals.  The Speech Scientists in PS are among the best and brightest, with PhDs from universities such as Cambridge (UK), MIT, McGill, Harvard, Penn, CMU, and Georgia Tech, and having worked at research labs such Bell Labs, Motorola Labs, and ATR (Japan), culminating in over 300 years of Speech Science experience and covering well over 20 languages.

    Come and join Nuance PS and work on the latest technology from one of the prominent speech recognition technology providers, and make a difference in the way the world communicates.

    Job Overview

    As a Speech Scientist in the Professional Services group, you will work on automated speech recognition applications, covering a broad range of activities in all project phases, including the design, development, and optimization of the system.  You will:

    • Work across application development teams to ensure best possible recognition performance in deployed systems
    • Identify recognition challenges and assess accuracy feasibility during the design phase,
    • Design, develop, and test VoiceXML grammars and create JSPs, Java, and ECMAscript grammars for dynamic contexts
    • Optimize accuracy of applications by analyzing performance and tuning statistical language models, pronunciations, and acoustic models, including identifying areas for improvement by running the recognizer offline
    • Contribute to the generation and presentation of client-facing reports
    • Act as technical lead on more intensive client projects
    • Develop methodologies, scripts, procedures that improve efficiency and quality
    • Develop tools and enhance algorithms that facilitate deployment and tuning of recognition components
    • Act as subject matter domain expert for specific knowledge domains
    • Provide input into the design of future product releases

         Required Skills

    • MS or PhD in Computer Science, Engineering, Computational Linguistics, Physics, Mathematics, or related field (or equivalent)
    • Strong analytical and problem solving skills and ability to troubleshoot issues
    • Good judgment and quick-thinking
    • Strong programming skills, preferably Perl or Python
    • Excellent written and verbal communications skills
    • Ability to scope work taking technical, business and time-frame constraints into consideration
    • Works well in a team and in a fast-paced environment

    Beneficial Skills

    • Strong programming skills in either Perl, Python, Java, C/C++, or Matlab
    • Speech recognition knowledge
    • Strong pattern recognition, linguistics, signal processing, or acoustics knowledge
    • Statistical data analysis
    • Experience with XML, VoiceXML, and Wiki
    • Ability to mentor or supervise others
    • Additional language skills, eg French, Dutch, German, Spanish

    Back to Top
  • Nuance: Research engineer speech engine

     

    n order to strengthen our Embedded ASR Research team, we are looking for a:

     RESEARCH ENGINEER SPEECH ENGINE

    As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

     OVERVIEW:

    - You will work in Nuance's Embedded ASR R&D team, developing, improving and maintaining core ASR engine algorithms for our customers in the Automotive and Mobile sector.

    - You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

    - You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the developing, improving and maintaining core ASR engine algorithms for cutting edge speech and natural language understanding technologies for automotive and mobile devices.

    - You will work on the design and development of more efficient, flexible ASR search algorithms with high focus on low memory and processor requirements.

    QUALIFICATIONS:

    - You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field. PhD is a plus.

    - A background in (computational) linguistics, speech processing, ASR search, confidence values, grammars, statistics and machine learning, especially as related to natural language processing.

    - You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

    The following skills are a plus:

    - You have experience with Python or other scripting languages.

    - Broad knowledge about architectures of embedded platforms and processors.

    - Understanding of databases

    - You can work both as a team player and as goal-oriented independent software engineer.

    - You can work in a multi-national team and communicate effectively with people of different cultures.

    - You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    - You are fluent in English and you can write high quality documentation.

    - Knowledge of other languages is a plus.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

    Deanna Roe                  Deanna.roe@nuance.com

    Please make sure to document to us your excellent software engineering skills.

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • Nuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:

     

    In order to strengthen our Embedded ASR Research team, we are looking for a:

        RESEARCH ENGINEER SPEECH DIALOGUE SYSTEMS

    As part of our team, you will be creating speech technologies for embedded applications varying from simple command and control tasks up to natural language speech dialogues on mobile and automotive platforms.

    OVERVIEW:

    -You will work in Nuance's Embedded ASR research and production team, creating technology, tools and runtime software to enable our customers develop embedded speech applications. In our team of speech and language experts, you will work on natural language dialogue systems that define the state of the art.

    - You will work at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the picturesque city of Ghent, in the Flanders region of Belgium. Ghent has one of the most spectacular historic town centers of Europe and is known for its unique vibrant yet cozy charm, and is home to a large international university.

    - You will work in an international company and cooperate with people on various locations including in Europe, America, and Asia.  You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the development of cutting edge natural language dialogue and speech recognition technologies for automotive embedded systems and mobile devices.

    - You will design, implement, evaluate, optimize, and test new algorithms and tools for our speech recognition systems, both for research prototypes and deployed products, including all aspects of dialogue systems design, such as architecture, natural language understanding, dialogue modeling, statistical framework, and so forth.

    - You will help the engine process multi-lingual natural and spontaneous speech in various noise conditions, given the challenging memory and processing power constraints of the embedded world.

    QUALIFICATIONS:

    - You have a university degree in computer science, (computational) linguistics, engineering, mathematics, physics, or a related field. A graduate degree is an asset.

    -You have strong software and programming skills, especially in C/C++, ideally for embedded applications. Knowledge of Python or other scripting languages is a plus. [HQ1] 

    - You have experience in one or more of the following fields:

         dialogue systems

         applied (computational) linguistics

         natural language understanding

         language generation

         search engines

         speech recognition

         grammars and parsing techniques.

         statistics and machine learning techniques

         XML processing

    -You are a team player, willing to take initiative and assume responsibility for your tasks, and are goal-oriented.

    -You can work in a multi-national team and communicate effectively with people of different cultures.

    -You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    -You are fluent in English and you can write high quality documentation.

    -Knowledge of other languages is a strong asset.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

     

    Deanna Roe                  Deanna.roe@nuance.com

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • Research Position in Speech Processing at Nagoya Institute of

     

    Research Position in Speech Processing at Nagoya Institute of

    Technology, Japan

    Nagoya Institute of Technology is seeking a researcher for a

    post-doctoral position in a new European Commission-funded project

    EMIME ("Efficient multilingual interaction in mobile environment")

    involving Nagoya Institute of Technology and other five European

    partners, starting in March 2008 (see the project summary below).

    The earliest starting date of the position is March 2007. The initial

    duration of the contract will be one year, with a possibility for

    prolongation (year-by-year basis, maximum of three years). The

    position provides opportunities to collaborate with other researchers

    in a variety of national and international projects. The competitive

    salary is calculated according to qualifications based on NIT scales.

    The candidate should have a strong background in speech signal

    processing and some experience with speech synthesis and recognition.

    Desired skills include familiarity with latest spectrum of technology

    including HTK, HTS, and Festival at the source code level.

    For more information, please contact Keiichi Tokuda

    (http://www.sp.nitech.ac.jp/~tokuda/).

     

    About us

    Nagoya Institute of Technology (NIT), founded on 1905, is situated in

    the world-quality manufacturing area of Central Japan (about one hour

    and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).

    NIT is a highest-level educational institution of technology and is

    one of the leaders of such institutions in Japan. EMIME will be

    carried at the Speech Processing Laboratory (SPL) in the Department of

    Computer Science and Engineering of NIT. SPL is known for its

    outstanding, continuous contribution of developing high-performance,

    high-quality opensource software: the HMM-based Speech Synthesis

    System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary

    continuous speech recognition engine "Julius"

    (http://julius.sourceforge.jp/), and the Speech Signal Processing

    Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is

    involved in numerous national and international collaborative

    projects. SPL also has close partnerships with many industrial

    companies, in order to transfer its research into commercial

    applications, including Toyota, Nissan, Panasonic, Brother Inc.,

    Funai, Asahi-Kasei, ATR.

    Project summary of EMIME

    The EMIME project will help to overcome the language barrier by

    developing a mobile device that performs personalized speech-to-speech

    translation, such that a user's spoken input in one language is used

    to produce spoken output in another language, while continuing to

    sound like the user's voice. Personalization of systems for

    cross-lingual spoken communication is an important, but little

    explored, topic. It is essential for providing more natural

    interaction and making the computing device a less obtrusive element

    when assisting human-human interactions.

    We will build on recent developments in speech synthesis using hidden

    Markov models, which is the same technology used for automatic speech

    recognition. Using a common statistical modeling framework for

    automatic speech recognition and speech synthesis will enable the use

    of common techniques for adaptation and multilinguality.

    Significant progress will be made towards a unified approach for

    speech recognition and speech synthesis: this is a very powerful

    concept, and will open up many new areas of research. In this

    project, we will explore the use of speaker adaptation across

    languages so that, by performing automatic speech recognition, we can

    learn the characteristics of an individual speaker, and then use those

    characteristics when producing output speech in another language.

    Our objectives are to:

    1. Personalize speech processing systems by learning individual

    characteristics of a user's speech and reproducing them in

    synthesized speech.

    2. Introduce a cross-lingual capability such that personal

    characteristics can be reproduced in a second language not spoken

    by the user.

    3. Develop and better understand the mathematical and theoretical

    relationship between speech recognition and synthesis.

    4. Eliminate the need for human intervention in the process of

    cross-lingual personalization.

    5. Evaluate our research against state-of-the art techniques and in a

    practical mobile application.

    Back to Top
  • C/C++ Programmer Munich, Germany

    Digital publishing AG is one of Europe's leading producers of  interactive software for foreign language training. In our e- learning courses we want to place the emphasis on speaking and  spoken language understanding.  In order to strengthen our Research & Development Team in Munich,  Germany, we are looking for experienced C or C++ programmers with  at least 3 years experience in the design and coding of  sophisticated software systems under Windows.   
    We offer   
    -a creative working atmosphere in an international team of   software engineers, linguists and editors working on    challenging research projects in speech recognition and    speech dialogue systems  
    - participation in all phases of a product life cycle, as we    are interested in the fast transfer of research results    into products.  
    - the possibility to participate in international scientific    conferences.   
    - a permanent job in the center of Munich.  
    - excellent possibilities for development within our fast    growing company.    
    - flexible working times, competitive compensation and    arguably the best espresso in Munich.   
    We expect  
    -several years of practical experience in software    development in C or C++ in a commercial or academic    environment.  
    -experience with parallel algorithms and thread    programming.  
    -experience with object-oriented design of software    systems.  
    -good knowledge of English or German.   
    Desirable is  
    -experience with optimization of algorithms.  
    -experience in statistical speech or language    processing, preferably speech recognition, speech    synthesis, speech dialogue systems or chatbots.  
    -experience with Delphi or Turbo Pascal.   
    Interested? We look forward to your application:  (preferably by e-mail)   
    digital publishing AG  
    Freddy Ertl  f.ertl@digitalpublishing.de  
    Tumblinger Straße 32  
    D-80337 München Germany 

    Back to Top
  • Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA

     

    Speech and Natural Language Processing Engineer


    M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


    We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

    Responsibilities:

    • You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
    • You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
    • You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.

     

    Requirements:

    • Solid background in speech recognition, natural language processing, machine learning and information extraction.
    • 2+ years of experience participating in software development projects
    • Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
    • Excellent analytical and problem-solving skills
    • Integrate and communicate well in small R&D teams
    • Masters degree in CS or related engineering fields
    • Experience in a healthcare-related field a plus

     

    In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

     

    M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com. 

     

    Back to Top
  • Senior Research Scientist -- Speech and Natural Lgage Processing at M*Modal, Pittsburgh, PA,USA

     

    Senior Research Scientist -- Speech and Natural Language Processing


    M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


    We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

    Responsibilities:

    • Plan and perform research and development tasks to continuously improve a state-of-the-art speech understanding system
    • Take a leading role in identifying solutions to challenging technical problems
    • Contribute original ideas and turn them into product-grade software implementations
    • Collaborate with other members of the R&D team to identify, analyze and resolve technical issues

     

    Requirements:

    • Solid research & development background with 3+ years of experience in speech recognition research, covering at least two of the following topics: speech processing, acoustic modeling, language modeling, decoding, LVCSR, natural language processing/understanding, speaker verification/identification, audio mining
    • Working knowledge of Machine Learning, Information Extraction and Natural Language Processing algorithms
    • 3+ years of experience participating in large-scale software development projects using C++ and Java.
    • Excellent analytical, problem-solving and communication skills
    • PhD with focus on speech recognition or Masters degree with 3+ years industry experience working on automatic speech recognition
    • Experience and/or education in medical informatics a plus
    • Working experience in a healthcare related field a plus

     


    In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

     

    M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com. 

     

    Back to Top
  • Postdoc position at LORIA, Nancy, France

    Building an articulatory model from ultrasound, EMA and MRI data

     

    Postdoctoral position

     

     

    Research project

    An articulatory model comprises both the visible and the internal mobile articulators which are involved in speech articulation: the lower jaw, tongue, lips and velum) as well as the fixed walls (the palate, the rear wall of the pharynx). An articulatory model is dynamic since the articulators deform during speech production. Such a model has a potential interest in the field of language learning by providing visual feedback on the articulation conducted by the learner, and many other applications.

    Building an articulatory model is difficult because the different articulators have to be detected from specific image modalities: the lips are acquired through video, the tongue shape is acquired through ultrasound imaging with a high frame rate but these 2D images are very noisy. Finally, 3D images of all articulators can be obtained with MRI but only for sustained sounds (as vowels) due to the long acquisition time of MRI images.

    The subject of this post-doc is to construct a dynamic 3D model of the entire vocal tract by merging the 3D information available in the MRI acquisitions and temporal 2D information provided by the contours of the tongue visible on the ultrasound images or X-ray images.

    We are working on the construction of an articulatory model within the European project ASPI (http://aspi.loria.fr/ ).

    We already built an acquisition system which allows us to obtain synchronized data from ultrasound, MRI, video and EM modalities.

    Only a few complete articulatory models are currently available in the world and a real challenge in the field is to design set-ups and easy-to-use methods for automatically building the model of any speaker from 3D and 2D images. Indeed, the existence of more articulatory models would open new directions of research about speaker variability and speech production.

     

    Objectives

    The aim of the subject is to build a deformable model of the vocal tract from static 3D MRI images and 2D dynamic 2D sequences. Previous works have been conducted on the modelling of the vocal tract, and especially of the tongue (M. Stone[1] O. Engwall[2]). Unfortunately, important human interaction is required to extract tongue contours in the images. In addition, only one image modality is often considered in these works, thus reducing the reliability of the model obtained.

    The aim of this work is to provide automatic methods for segmenting features in the images as well as methods for building a parametric model of the 3D vocal tract with these specific aims:

    • The segmentation process is to be guided by prior knowledge on the vocal tract. In particular shape, topologic as well as regularity constraints must be considered.
    • A parametric model of the vocal tract has to be defined (classical models are linear and built from a principal component analysis). Special emphasis must be put on the problem of matching the various features between the images.
    • Besides classical geometric constraints, both the building and the assessment of the model will be guided by acoustic distances in order to check for the adequation between the sound synthesized from the model and the sound realized by the human speaker.

     

    Skill and profile

    The recruited person must have a solid background in computer vision and in applied mathematics. Informations and demonstrations on the research topics addressed by the Magrit team are available at http://magrit.loria.fr/  

     

    References

    [1] M. Stone : Modeling tongue surface contours from Cine-MRI images. Journal of Speech, language, hearing research, 2001.

    [2]:P. Badin, G. Bailly, L. Reveret: Three-dimensional linear articulatory modeling of tongue, lips and face based on MRI and video images, Journal of Phonetics, 2002, vol 30, p 533-553

     

    Contact

    Interested candidates are invited to contact Marie-Odile Berger, berger@loria.fr, +33 3 54 95 85 01

     

    Important information

    This position is advertised in the framework of the national INRIA campaign for recruiting post-docs. It is a one year position, renewable, beginning fall 2008. The salary is 2,320€ gross per month. 

     

    Selection of candidates will be a two step process. A first selection for a candidate will be carried out internally by the Magrit group. The selected candidate application will then be further processed for approval and funding by an INRIA committee.

     

    Doctoral thesis less than one year old (May 2007) or being defended before end of 2008. If defence has not taken place yet, candidates must specify the tentative date and jury for the defence.

     

    Important - Useful links

    Presentation of INRIA postdoctoral positions

    To apply (be patient, loading this link takes times...)

     

    Back to Top

Journals

  • Papers accepted for FUTURE PUBLICATION in Speech Communication

    Full text available on http://www.sciencedirect.com/ for Speech Communication subscribers and subscribing institutions. Free access for all to the titles and abstracts of all volumes and even by clicking on Articles in press and then Selected papers.

     

    top

    Back to Top
  • Special Issue on Non-Linear and Non-Conventional Speech Processing-Speech Communication

    Speech Communication

    Call for Papers: Special Issue on Non-Linear and Non-Conventional Speech Processing

    Editors: Mohamed CHETOUANI, UPMC

    Marcos FAUNDEZ-ZANUY, EUPMt (UPC)

    Bruno GAS, UPMC

    Jean Luc ZARADER, UPMC

    Amir HUSSAIN, Stirling

    Kuldip PALIWAL, Griffith University

    The field of speech processing has shown a very fast development in the past twenty years, thanks to both technological progress and to the convergence of research into a few mainstream approaches. However, some specificities of the speech signal are still not well addressed by the current models. New models and processing techniques need to be investigated in order to foster and/or accompany future progress, even if they do not match immediately the level of performance and understanding of the current state-of-the-art approaches.

    An ISCA-ITRW Workshop on "Non-Linear Speech Processing" will be held in May 2007, the purpose of which will be to present and discuss novel ideas, works and results related to alternative techniques for speech processing departing from the mainstream approaches:  http://www.congres.upmc.fr/nolisp2007

    We are now soliciting journal papers not only from workshop participants but also from other researchers for a special issue of Speech Communication on "Non-Linear and Non-Conventional Speech Processing"

    Submissions are invited on the following broad topic areas:

    I. Non-Linear Approximation and Estimation  

    II. Non-Linear Oscillators and Predictors

    III. Higher-Order Statistics

    IV. Independent Component Analysis 

     V. Nearest Neighbours

     VI. Neural Networks 

     VII. Decision Trees

     VIII. Non-Parametric Models  

    IX. Dynamics of Non-Linear Systems   

     X. Fractal Methods 

     XI. Chaos Modelling  

     XII. Non-Linear Differential Equations

    All fields of speech processing are targeted by the special issue, namely :

    1. Speech Production 

    2. Speech Analysis and Modelling

    3. Speech Coding 

    4. Speech Synthesis 

    5. Speech Recognition 

    6. Speaker Identification / Verification 

    7. Speech Enhancement / Separation 

    8. Speech Perception

    Back to Top
  • Journal of Multimedia User Interfaces

    Journal on Multimodal User Interfaces

    The development of Multimodal User Interfaces relies on systemic research involving signal processing, pattern analysis, machine intelligence and human computer interaction. This journal is a response to the need of common forums grouping these research communities. Topics of interest include, but are not restricted to:

    • Fusion & Fission,
    • Plasticity of Multimodal interfaces,
    • Medical applications,
    • Edutainment applications,
    • New modalities and modalities conversion,
    • Usability,
    • Multimodality for biometry and security,
    • Multimodal conversational systems.

    The journal is open to three types of contributions:

    • Articles: containing original contributions accessible to the whole research community of Multimodal Interfaces. Contributions containing verifiable results and/or open-source demonstrators are strongly encouraged.
    • Tutorials: disseminating established results across disciplines related to multimodal user interfaces.
    • Letters: presenting practical achievements / prototypes and new technology components.

    JMUI is a Springer-Verlag publication from 2008.

     

    The submission procedure and the publication schedule are described at:

    www.jmui.org

    The page of the journal at springer is:

    http://www.springer.com/east/home?SGWID=5-102-70-173760003-0&changeHeader=true

    More information:

    Imre Váradi (varadi@tele.ucl.ac.be)

    Back to Top
  • CfP CALL FOR PAPERS -- CURRENT RESEARCH IN PHONOLOGY AND PHONETICS: INTERFACES WITH NATURAL LANGUAGE PROCESSING

    CALL FOR PAPERS -- CURRENT RESEARCH IN PHONOLOGY AND PHONETICS: INTERFACES WITH NATURAL LANGUAGE PROCESSING

    A SPECIAL ISSUE OF THE JOURNAL TAL
    (Traitement Automatique des Langues)

    Guest Editors: Bernard Laks and Noël Nguyen

    EXTENDED DEADLINE: 11 February 2008

    There are long-established connections between research on the sound shape of language and natural language processing (NLP), for which one of the main driving forces has been the design of automatic speech synthesis and recognition systems. Over the last few years, these connections have been made yet stronger, under the influence of several factors. A first line of convergence relates to the shared collection and exploitation of the considerable resources that are now available to us in the domain of spoken language. These resources have come to play a major role both for phonologists and phoneticians, who endeavor to subject their theoretical hypotheses to empirical tests using large speech corpora, and for NLP specialists, whose interest in spoken language is increasing. While these resources were first based on audio recordings of read speech, they have been progressively extended to bi- or multimodal data and to spontaneous speech in conversational interaction. Such changes are raising theoretical and methodological issues that both phonologists/phoneticians and NLP specialists have begun to address.

    Research on spoken language has thus led to the generalized utilization of a large set of tools and methods for automatic data processing and analysis: grapheme-to-phoneme converters, text-to-speech aligners, automatic segmentation of the speech signal into units of various sizes (from acoustic events to conversational turns), morpho-syntactic tagging, etc. Large-scale corpus studies in phonology and phonetics make an ever increasing use of tools that were originally developed by NLP researchers, and which range from electronic dictionaries to full-fledged automatic speech recognition systems. NLP researchers and phonologists/phoneticians also have jointly contributed to developing multi-level speech annotation systems from articulatory/acoustic events to the pragmatic level via prosody and syntax.

    In this scientific context, which very much fosters the establishment of cross-disciplinary bridges around spoken language, the knowledge and resources accumulated by phonologists and phoneticians are now being put to use by NLP researchers, whether this is to build up lexical databases from speech corpora, to develop automatic speech recognition systems able to deal with regional variations in the sound pattern of a language, or to design talking-face synthesis systems in man-machine communication.

    LIST OF TOPICS

    The goal of this special issue will be to offer an overview of the interfaces that are being developed between phonology, phonetics, and NLP. Contributions are therefore invited on the following topics:

    . Joint contributions of speech databases to NLP and phonology/phonetics

    . Automatic procedures for the large-scale processing of multi-modal databases

    . Multi-level annotation systems

    . Research in phonology/phonetics and speech and language technologies: synthesis, automatic recognition

    . Text-to-speech systems

    . NLP and modelisation in phonology/phonetics

    Papers may be submitted in English (for non native speakers of French only) or French and will relate to studies conducted on French, English, or other languages. They must conform to the TAL guidelines for authors available at http://www.atala.org/rubrique.php3?id_rubrique=1.

    DEADLINES

    . 11 February 2008: Reception of contributions
    . 11 April 2008: Notification of pre-selection / rejection
    . 11 May 2008: Reception of pre-selected articles
    . 16 June 2008: Notification of final acceptance
    . 30 June 2008: Reception of accepted articles' final versions

    This special issue of Traitement Automatique des Langues will appear in autumn 2008.

    THE JOURNAL

    TAL (Traitement Automatique des Langues / Natural Language Processing, http://www.atala.org/rubrique.php3?id_rubrique=1) is a forty-year old international journal published by ATALA (French Association for Natural Language Processing) with the support of CNRS (French National Center for Scientific Research). It has moved to an electronic mode of publication, with printing on demand. This affects in no way its reviewing and selection process.

    SCIENTIFIC COMMITTEE

    . Martine Adda-Decker, LIMSI, Orsay
    . Roxane Bertrand, LPL, CNRS & Université de Provence
    . Philippe Blache, LPL, CNRS & Université de Provence
    . Cédric Gendrot, LPP, CNRS & Université de Paris III
    . John Goldsmith, University of Chicago
    . Guillaume Gravier, Irisa, CNRS/INRIA & Université de Rennes I
    . Jonathan Harrington, IPS, University of Munich
    . Bernard Laks, MoDyCo, CNRS & Université de Paris X
    . Lori Lamel, LIMSI, Orsay
    . Noël Nguyen, LPL, CNRS & Université de Provence
    . François Pellegrino, DDL, CNRS & Université de Lyon II
    . François Poiré, University of  Western Ontario
    . Yvan Rose, Memorial University of Newfoundland
    . Tobias Scheer, BCL, CNRS & Université de Nice
    . Atanas Tchobanov, MoDyCo, CNRS & Université de Paris X
    . Jacqueline Vaissière, LPP, CNRS & Université de Paris III
    . Nathalie Vallée, DPC-GIPSA, CNRS & Université de Grenoble III

    Back to Top

Future Conferences

  • Publication policy: Hereunder, you will find very short announcements of future events. The full call for participation can be accessed on the conference websites
    See also our Web pages (http://www.isca-speech.org/) on conferences and workshops.

    Back to Top

Future Interspeech conferences

  • INTERSPEECH 2008

    September 22-26, 2008, Brisbane, Queensland, Australia
    Conference Website
    Chairman: Denis Burnham, MARCS, University of West Sydney.

     

    Back to Top
  • INTERSPEECH 2009

    Brighton, UK,
    Conference Website
    Chairman: Prof. Roger Moore, University of Sheffield.

     

    Back to Top
  • INTERSPEECH 2010

    Chiba, Japan
    Conference Website
    ISCA is pleased to announce that INTERSPEECH 2010 will take place in Makuhari-Messe, Chiba, Japan, September 26-30, 2010. The event will be chaired by Keikichi Hirose (Univ. Tokyo), and will have as a theme "Towards Spoken Language Processing for All - Regardless of Age, Health Conditions, Native Languages, Environment, etc."

     

    top

    Back to Top

Future ISCA Technical and Research Workshops

  • ISCA ITRW speech analysis and processing for knowledge discovery

    June 4 - 6, 2008
    Aalborg, Denmark
    Workshop website
    Humans are very efficient at capturing information and messages in speech, and they often perform this task effortlessly even when the signal is degraded by noise, reverberation and channel effects. In contrast, when a speech signal is processed by conventional spectral analysis methods, significant cues and useful information in speech are usually not taken proper advantage of, resulting in sub-optimal performance in many speech systems. There exists, however, a vast literature on speech production and perception mechanisms and their impacts on acoustic phonetics that could be more effectively utilized in modern speech systems. A re-examination of these knowledge sources is needed. On the other hand, recent advances in speech modelling and processing and the availability of a huge collection of multilingual speech data have provided an unprecedented opportunity for acoustic phoneticians to revise and strengthen their knowledge and develop new theories. Such a collaborative effort between science and technology is beneficial to the speech community and it is likely to lead to a paradigm shift for designing next-generation speech algorithms and systems. This, however, calls for a focussed attention to be devoted to analysis and processing techniques aiming at a more effective extraction of information and knowledge in speech.
    Objectives:
    The objective of this workshop is to discuss innovative approaches to the analysis of speech signals, so that it can bring out the subtle and unique characteristics of speech and speaker. This will also help in discovering speech cues useful for improving the performance of speech systems significantly. Several attempts have been made in the past to explore speech analysis methods that can bridge the gap between human and machine processing of speech. In particular, the time varying aspects of interactions between excitation and vocal tract systems during production seem to elude exploitation. Some of the explored methods include all-pole and polezero modelling methods based on temporal weighting of the prediction errors, interpreting the zeros of speech spectra, analysis of phase in the time and transform domains, nonlinear (neural network) models for information extraction and integration, etc. Such studies may also bring out some finer details of speech signals, which may have implications in determining the acoustic-phonetic cues needed for developing robust speech systems.
    The Workshop:
    G will present a full-morning common tutorial to give an overview of the present stage of research linked to the subject of the workshop
    G will be organised as a single series of oral and poster presentations
    G each oral presentation is given 30 minutes to allow for ample time for discussion
    G is an ideal forum for speech scientists to discuss the perspectives that will further future research collaborations.
    Potential Topic areas:
    G Parametric and nonparametric models
    G New all-pole and pole-zero spectral modelling
    G Temporal modelling
    G Non-spectral processing (group delay etc)
    G Integration of spectral and temporal processing
    G Biologically-inspired speech analysis and processing
    G Interactions between excitation and vocal tract systems
    G Characterization and representation of acoustic phonetic attributes
    G Attributed-based speaker and spoken language characterization
    G Analysis and processing for detecting acoustic phonetic attributes
    G Language independent aspects of acoustic phonetic attributes detection
    G Detection of language-specific acoustic phonetic attributes
    G Acoustic to linguistic and acoustic phonetic mapping
    G Mapping from acoustic signal to articulator configurations
    G Merging of synchronous and asynchronous information
    G Other related topics
    Call for papers. Notification of review:
    The submission deadline is edxtended to February 14, 2008.
    Registration
    Fees for early and late registration for ISCA and non-ISCA members will be made available on the website during September 2007.
    Venue:
    The workshop will take place at Aalborg University, Department of Electronic Systems, Denmark. See the workshop website for further and latest information.
    Accommodation:
    There are a large number of hotels in Aalborg most of them close to the city centre. The list of hotels, their web sites and telephone numbers are given on the workshop website. Here you will also find information about transportation between the city centre and the university campus.
    How to reach Aalborg:
    Aalborg Airport is half an hour away from the international Copenhagen Airport. There are many daily flight connections between Copenhagen and Aalborg. Flying with Scandinavian Airlines System (SAS) or one of the Star Alliance companies to Copenhagen enables you to include Copenhagen-Aalborg into the entire ticket, and this way reducing the full transportation cost. There is also an hourly train connection between the two cities; the train ride lasts approx. five hours
    Organising Committee:
    Paul Dalsgaard, B. Yegnanarayana, Chin-Hui Lee, Paavo Alku, Rolf Carlson, Torbjørn Svendsen,
    Important dates
    Submission of full and final: January 31, 2008 on the Website
    http://www.es.aau.dk/ITRW/
    Notification of review results: No later than March 30., 2008.

    Back to Top
  • ITRW on Evidence-based Voice and Speech Rehabilitation in Head

      

     

      

    ISCA Workshop

    Evidence-based Voice and Speech Rehabilitation in Head & Neck Oncology

     

    Amsterdam, May 15-16, 2008

     

     

    Evidence-based Voice and Speech Rehabilitation is of increasing relevance in Head & Neck Oncology. The number of patients requiring treatment for cancer in the upper respiratory and vocal tract keeps rising. Moreover, treatment - whether it concerns an "organ preservation protocol" or traditional surgery and radiotherapy - negatively impacts the function of organs vital for communication. A "function preservation treatment" does, unfortunately, not yet exist. This workshop seeks to assemble the latest and most relevant knowledge on evidence-based voice and speech rehabilitation. Aside from the main topic (voice and speech rehabilitation after total laryngectomy), other areas, such as vocal issues in early-stage larynx carcinoma, and various stages of oral / oropharyngeal carcinoma will be addressed.

     

    The workshop comprises four topical sessions (see below). Each session includes two keynote lectures plus a round-table discussion and (maximally 10) poster presentations pertinent to the session's topic. A work document, based on the keynote lectures, will form the basis for each round-table discussion. This work document will contain all presently available research evidence, discuss its (clinical) relevance and will formulate directions and areas of interest for future research. The keynote lectures, work documents and poster papers are to be compiled into Workshop Proceedings, and will be published under ISCA flag (website: http://www.isca-speech.org/). It is our aim to make these Proceedings available at the workshop. This will result in a useful and traceable ‘State of the Art' handbook/CD/web publication.

     

    Prof. Dr. Frans JM Hilgers

    Prof. Dr. Louis CW Pols

    Dr. Maya van Rossum

    Venue:

     

    Tinbergen lecture hall, Royal Netherlands Academy of Arts and Sciences. Kloveniersburgwal 29, Amsterdam

     

    More information can be obtained from the website www.fon.hum.uva.nl/webhnr/

    or by sending a