ISCApad

  • Editorial
  • ISCA News
  • SIG's activities
  • Courses, Internships
  • Books, Databases, Softwares
  • Job openings
  • Journals
  • Future Conferences
  • Future Interspeech conferences
  • Future ISCA Technical and Research Workshops
  • Forthcoming events supported (but not organized) by ISCA
  • Future Speech Science and Technology Events
Number: 115 Date: 01/12/2008
Author: Chris Wellekens

Editor: Chris Wellekens

 

 Dear Members,

Happy New Year to all members! Our Chinese colleagues will have the luck to celebrate New Year  again in February!

Thanks to all of you who reported their interest in ISCApad by sending either congratulations or criticisms. They all helped us progress in our communication with the community.

 The number of job offers is continually growing: it is certainly a proof of the confidence of employers in the efficiency of advertising in the ISCA community. I also hope that it is a proof of good health of speech science and technology.

The rocketing prices of oil will have a major impact on our transportation costs: let us wish that our annual conferences and workshops will attract as many people as in the previous years. We all know that personal contacts bring more to the cohesion of the community than any other type of communication. But there is still a bright future for a newsletter!

Professor em. Chris Wellekens

Institut Eurecom France

Dear Members,

Happy New Year to all members!

Thanks to all of you who reported their interest for ISCApad by sending either congratulations or criticisms. They all helped us progress in our communication with the community.

 The number of job offers is continuously growing: it is certainly a proof of the confidence of employers in the efficiency of advertising in the ISCA community. I also hope that it is a proof of good health of speech science and technology.

The rocketting prices of oil will bear a lot on our transportation costs: let us wish that our annual conferences and workshops will attract as many people as in the previous years. We all know that personal contacts bring more to the cohesion of the community than any other communication mean. But there is still a bright future for a newsletter!

Professor em. Chris Wellekens

Institut Eurecom France

 

 

 

 

 

 

 

Back to Top

ISCA News

  • GOOGLE SCHOLAR AND ISCA ARCHIVE


    Google Scholar and the ISCA Archive

    The indexing of the ISCA Archive (http://www.isca-speech.org/archive/) by the Google Scholar search engine (http://scholar.google.com/) is now thorough enough to be quite useful, so this seems like a good time to give an overview of the service.  Google Scholar is a research literature search engine that provides full-text search for ISCA papers whose full text cannot be searched with other search engines. Google Scholar's citation tracking shows what papers have cited a particular paper, which can be very useful for finding follow-up work, related work and corrections.  More details about these and other features are given below.

    The titles, author lists, and abstracts of ISCA Archive papers are all on the public web, so they can be searched by a general-purpose search engine such as Google.  However, the full texts of most ISCA papers are password protected and thus cannot be searched with a general-purpose search engine.  Google Scholar, through an arrangement with ISCA, has access to the full text of ISCA papers. Google Scholar has similar arrangements with many other publishers.  (On the other hand, general-purpose search engines index all sorts of web pages and other documents accessible through the public web, many of which will not be in the Google Scholar index.  So it's often useful to perform the same search using both Google Scholar and a general-purpose search engine.)

    Google Scholar automatically extracts citations from the full text of papers. It uses this information to provide a "Cited by" list for each paper in the Google Scholar index.  This is a list of papers that have cited that paper. Google Scholar also provides an automatically generated "Related Articles" list for each paper.  The "Cited by" and "Related Articles" lists are powerful tools for discovering relevant papers.  Furthermore, the length of a paper's "Cited by" list can be used as a convenient (although imperfect) measure of the paper's impact.  Discussions about the subtleties of using Google Scholar to measure impact can be found at http://www.harzing.com/resources.htm#/pop_gs.htm and http://blogs.nature.com/nautilus/2007/07/google_scholar_as_a_measure_of.html.

    It's possible to restrict Google Scholar searches to papers published by ISCA by using Google Scholar's Advanced Search feature and entering "ISCA" in the "Return articles published in" field.  If "ISCA" is entered in that field, and nothing is entered in the main search field, then the search results will show what ISCA papers are the most highly cited.

    It should be noted that that there are many papers on ISCA-related topics which are not in the Google Scholar index.  For example, it seems many ICPhS papers are missing.  And old papers which have been scanned in from paper copies will either not have their full contents indexed, or will be indexed using imperfect OCR technology. Furthermore, as of November 2007 the indexing of the ISCA Archive by Google Scholar is still not 100% complete.  There are a few different areas which are not perfectly indexed, but the biggest planned improvement is to start using OCR for the ISCA papers which have been scanned in from paper copies.

    There may be a time lag between when a new event is added to the ISCA Archive in the future and when it appears in the Google Scholar index. This time lag may be longer than the usual lag of general-purpose search engines such as Google, because ISCA must create Google Scholar catalog data for every new event and because the Google Scholar index seems to update considerably more slowly than the Google index.

    Acknowledgements: ISCA's arrangement with Google Scholar is a project of students Rahul Chitturi, Tiago Falk, David Gelbart, Agustin Gravano, and Francis Tyers, ISCA webmaster Matt Bridger, and ISCA Archive coordinator Wolfgang Hess.  Our thanks to Google's Christian DiCarlo and Darcy Dapra, and the rest of the Google Scholar team.

    Back to Top

SIG's activities

  •  

    A list of Speech Interest Groups can be found on our web.

    Back to Top

Courses, Internships

  • Motorola Labs - Center for Human Interaction Research (CHIR) l

    Motorola Labs - Center for Human Interaction Research (CHIR) 
    located in Schaumburg Illinois, USA, 
    is offering summer intern positions in 2008 (12 weeks each). 
     
    CHIR's mission
     
    Our research lab develops technologies that provide access to rich communication, media and 
    information services effortless, based on natural, intelligent interaction. Our research 
    aims on systems that adapt automatically and proactively to changing environments, device 
    capabilities and to continually evolving knowledge about the user.
     
    Intern profiles
     
    1) Acoustic environment/event detection and classification. 
    Successful candidate will be a PhD student near the end of his/her PhD study and is skilled 
    in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. 
    Candidates with knowledge of acoustic environment/event classification are preferred. 
     
    2) Speaker adaptation for applications on speech recognition and spoken document retrieval
    The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete 
    understanding and hand-on experience on automatic speech recognition related research. Proficiency 
    in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker 
    adaptation is highly preferred.
     
    3) Development of voice search-based web applications on a smartphone 
    We are looking for an intern candidate to help create an "experience" prototype based on our 
    voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and 
    rich interaction with web resources. This intern project is oriented more towards software engineering 
    than research. We target an intern with a master's degree and strong software engineering background. 
    Mastery of C++ and experience with web programming (AJAX and web services) is required. 
    Development experience on Windows CE/Mobile desired.
     
    4) Integrated Voice Search Technology For Mobile Devices
    Candidate should be proficient in information retrieval, pattern recognition and speech recognition. 
    Candidate should program in C++ and script languages such as Python or Perl in Linux environment. 
    Also, he/she should have knowledge on information retrieval or search engines.
     
    We offer competitive compensation, fun-to-work environment and Chicago-style pizza.
     
    If you are interested, please send your resume to:
     
    Dusan Macho, CHIR-Motorola Labs
    Email: dusan [dot] macho [at] motorola [dot] com
    Tel: +1-847-576-6762

    Back to Top

Books, Databases, Softwares

  • Books


    La production de la parole
    Author: Alain Marchal, Universite d'Aix en Provence, France
    Publisher: Hermes Lavoisier
    Year: 2007

    Speech enhancement-Theory and Practice
    Author: Philipos C. Loizou, University of Texas, Dallas, USA
    Publisher: CRC Press
    Year:2007

    Speech and Language Engineering
    Editor: Martin Rajman
    Publisher: EPFL Press, distributed by CRC Press
    Year: 2007

    Human Communication Disorders/ Speech therapy
    This interesting series can be listed on Wiley website

    Incurses em torno do ritmo da fala
    Author: Plinio A. Barbosa
    Publisher: Pontes Editores (city: Campinas)
    Year: 2006 (released 11/24/2006)
    (In Portuguese, abstract attached.) Website

    Speech Quality of VoIP: Assessment and Prediction
    Author: Alexander Raake
    Publisher: John Wiley & Sons, UK-Chichester, September 2006
    Website

    Self-Organization in the Evolution of Speech, Studies in the Evolution of Language
    Author: Pierre-Yves Oudeyer
    Publisher:Oxford University Press
    Website

    Speech Recognition Over Digital Channels
    Authors: Antonio M. Peinado and Jose C. Segura
    Publisher: Wiley, July 2006
    Website

    Multilingual Speech Processing
    Editors: Tanja Schultz and Katrin Kirchhoff ,
    Elsevier Academic Press, April 2006
    Website

    Reconnaissance automatique de la parole: Du signal a l'interpretation
    Authors: Jean-Paul Haton
    Christophe Cerisara
    Dominique Fohr
    Yves Laprie
    Kamel Smaili
    392 Pages
    Publisher: Dunod

    Back to Top
  • Spotlight on LDC Programmers and Software Tools

    Spotlight on LDC Programmers and Software Tools  -  


    -  LDC Member Survey  -

    -  Membership Fee Increases and Discounts  -

    LDC2007T36
    -  Chinese Treebank 6.0 (CTB 6.0)  -

    LDC2007S11
    -  2004 Spring NIST Rich Transcription (RT-04S) Development Data  -

    -  LDC Offices to Close for Winter Break  -




    Spotlight on LDC Programmers and Software Tools


    As part of our 15th Anniversary celebration, we have highlighted one aspect of the LDC in our monthly newsletters.  These features provided our members and data users with a glimpse of the broad range of the LDC's research activities.  The last feature of the year will focus on the LDC's software programmers and the tools they create. 

    A large segment of LDC's programming group is led by Senior Research Programmer Kazuaki Maeda. Besides being a programmer, Maeda is a linguist specializing in phonetics, phonology and computational linguistics. The group currently has ten full-time staff, augmented as necessary by part-time programmers. LDC's programmers are adept in all major programming languages and can work across platforms; their work supports virtually every aspect of LDC's operation. More information about LDC's programmers can be found on our staff page .


    One of the programming group's principal responsibilities is to develop workflow management software and annotation and transcription tools to support projects such as GALE and LCTL . Our goal is to make tools developed for general use broadly available.   One such tool is XTrans, a next generation transcription tool that is designed to support transcription tasks in multiple languages on multiple platforms. Its versatile and powerful waveform display/playback component can load multiple audio files of different file formats and sampling rates at the same time.  The virtual channel supported by XTrans provides the most natural method for transcribing overlapping speech. Virtual channel represents an audio source, not a physical channel, that is identified and transcribed in a given audio recording. A single-channel audio file can contain many audio sources. For instance, a round-table talk show with five speakers contains five audio sources in a single-channel audio recording. With XTrans, that file is modeled as a 5-virtual-channel audio file, and each virtual channel is transcribed independently. Additionally, if a recording consists of audio files with different sampling rates, XTrans will automatically resample them to the same rate. The LDC has used XTrans for many varied projects, and the tool has proven to be quick to learn and easy to master. We are currently working through licensing issues with organizations that provided libraries for XTrans. Once those issues are resolved, we will make XTrans generally available.

    Two other general use tools developed by the LDC  -- The Annotation Graph Toolkit (AGTK) and Champollion Tool Kit (CTK) -- are available on Sourceforge.net  Like XTrans, these tools represent creative solutions to difficult problems:
    <!--[if !supportLineBreakNewLine]-->
    <!--[endif]-->

    • The Annotation Graph Toolkit (AGTK)  is a primary resource for annotation tool development at LDC. AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video).  Unlike the traditional approach of designing and implementing data structures and user interfaces for new tasks from scratch, AGTK allows developers to quickly prototype tools and define data formats.  The flexible nature of the AG model means that data representations can be rapidly modified in response to evolving annotation task definitions. AGTK allows for rapid deployment of highly specialized, task-specific tools that maximize user interface ergonomics and improve the speed and accuracy of annotation.
    • Champollion Tool Kit (CTK) was developed to address issues in aligning parallel text consisting of remote language pairs and a significant amount of noise.  To achieve high precision and recall on manually-aligned text, CTK assumes a noisy input, that is, that a sizable percentage of alignments will not be one to one, and that the number of deletions and insertions will be significant.  Furthermore, CTK differs from other lexicon-based approaches in assigning greater weight to less frequent translation pairs.  CTK was first evaluated using Chinese-English parallel text but is designed to be used on as many language pairs as possible.

    XTrans, AGTK and CTK are representative of the work by LDC's programmers, making it possible for us to support projects of increasing complexity and to distribute a growing variety of linguistic resources.  The LDC Catalog contains several publications which were created using software tools developed by LDC's programming group.  These include ACE data, Arabic Treebank publications, and NIST Rich Transcription corpora.

     

    LDC Member Survey

     

    In order to determine how the consortium as a whole views the LDC, we are conducting a survey of our active users. Each person and organization who licensed data and/or purchased an LDC membership in 2006 and 2007 will have received an email on December 17 that contained a link to the online survey. Those who complete the survey before January 14, 2008 will be eligible to win a $500 benefit good towards any corpus or membership purchase in 2008. There will be a blind drawing in January 2008 and one winner will be selected from the pool of respondents. Based on last year's response rates, each respondent will have an approximate 1 in 100 chance of winning!

     

    Membership Fee Increases and Discounts


    The LDC will raise membership fees effective January 1, 2008.  Please click here for new pricing information and options for obtaining discounts on membership fees.
    <!--[if !supportLineBreakNewLine]-->
    <!--[endif]-->

    New Publications

    <!--[if !supportLineBreakNewLine]-->
    <!--[endif]-->

    (1) The Chinese Treebank project began at the University of Pennsylvania in 1998 and continues at Penn and the University of Colorado. Chinese Treebank 6.0 is the latest version produced from this effort, consisting of 780,000 words (over 1.28 million Chinese characters) that are segmented, part-of-speech tagged and fully bracketed. The data sources include newswire from Xinhua News Agency, articles from Sinorama Magazine, news from the website of the Hong Kong Special Administrative Region and transcripts from various broadcast news programs.

    This release encompasses 2,036 text files, containing 28,295 sentences, 781,351 words and 1,285,149 hanzi (Chinese characters). The data is provided in two encodings: GBK and UTF-8, and the annotation has Penn Treebank-style labeled brackets.  The data is provided in four different formats: raw text, word segmented, word segmented and POS-tagged, and syntactically bracketed.  Chinese Treebank 6.0 (CTB 6.0) is distributed via web download.

    2007 Subscription Members will automatically receive two copies of this corpus on disc. 2007 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$700.
    <!--[if !supportLineBreakNewLine]-->
    <!--[endif]-->

    *

     

    (2)  The 2004 Spring NIST Rich Transcription (RT-04S) Development Data contains the test material (meeting speech and reference transcripts) used in the RT-04S evaluation administered by the NIST (National Institute of Standards and Technology) Speech Group. Rich Transcription (RT) is broadly defined as a fusion of speech-to-text technology and metadata extraction technologies designed to provide the basis for a generation of more usable transcriptions of human-human meeting speech.

    The RT-04S development data consists of approximately 10 minutes of recordings of eight meetings held at ISCI, CMU, LDC and NIST. Although the development data is comprised of 10-minute excerpts from the same data collection sites which are represented in LDC2007S12 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data, it is not completely reflective of the evaluation test data since it contains lapel mics in lieu of head mics for the LDC and CMU data and some different distant mics for LDC data.

    RT-04S included the following tasks in the meeting domain:

    Speech-to-Text Transcription (STT) tasks

    Microphone conditions:
    ·         Multiple distant microphones
    ·         Single distant microphone
    ·         Individual head microphone

    Processing time conditions:
    ·         Unlimited time STT
    ·         Less than or equal to twenty times realtime
    ·         Less than or equal to ten times realtime
    ·         Less than or equal to one times realtime


    Diarization (SPKR) task (who spoke when)

    Microphone conditions:
    ·         Multiple distant microphones
    ·         Single distant microphone

    Input conditions:
    ·         Speech input only
    ·         Speech plus reference transcript input

    Processing time conditions:
    ·         Unlimited time
    ·         Less than or equal to twenty times realtime
    ·         Less than or equal to ten times realtime
    ·         Less than or equal to one time realtime


    2004 Spring NIST Rich Transcription (RT-04S) Development Data is distributed on one DVD-ROM.

    2007 Subscription Members will automatically receive two copies of this corpus. 2007 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$2000.


    LDC Offices to Close for Winter Break


    The LDC would like to inform our customers that we will be closed from December 24, 2007 through January 1, 2008 in accordance with the University of Pennsylvania Winter Break Policy.  Our offices will reopen on Wednesday, January 2, 2008.  Requests received for membership renewals and corpora will be processed at that time.

    Best wishes for a happy and safe holiday season!



    Ilya Ahtaridis
    Membership Coordinator --------------------------------------------------------------------
    Linguistic Data Consortium                     Phone: (215) 573-1275 University of Pennsylvania                       Fax: (215) 573-2175 3600 Market St., Suite 810                         ldc@ldc.upenn.edu Philadelphia, PA 19104 USA                  http://www.ldc.upenn.edu/

     

    Back to Top
  • Journal on Multimodal User Interfaces

     

    Journal on Multimodal User Interfaces

    The development of Multimodal User Interfaces relies on systemic research involving signal processing, pattern analysis, machine intelligence and human computer interaction. This journal is a response to the need of common forums grouping these research communities. Topics of interest include, but are not restricted to:

    • Fusion & Fission,
    • Plasticity of Multimodal interfaces,
    • Medical applications,
    • Edutainment applications,
    • New modalities and modalities conversion,
    • Usability,
    • Multimodality for biometry and security,
    • Multimodal conversational systems.

    The journal is open to three types of contributions:

    • Articles: containing original contributions accessible to the whole research community of Multimodal Interfaces. Contributions containing verifiable results and/or open-source demonstrators are strongly encouraged.
    • Tutorials: disseminating established results across disciplines related to multimodal user interfaces.
    • Letters: presenting practical achievements / prototypes and new technology components.

    JMUI is a Springer-Verlag publication from 2008.

     

    The submission procedure and the publication schedule are described at:

    www.jmui.org

    The page of the journal at springer is:

    http://www.springer.com/east/home?SGWID=5-102-70-173760003-0&changeHeader=true

    More information:

    Imre Váradi (varadi@tele.ucl.ac.be)

    Back to Top

Job openings

  • We invite all laboratories and industrial companies which have job offers to send them to the ISCApad editor: they will appear in the newsletter and on our website for free. (also have a look at http://www.isca-speech.org/jobs.html as well as http://www.elsnet.org/ Jobs)

    Back to Top
  • Speech Engineer/Senior Speech Engineer at Microsoft, Mountain View, CA,USA

    Job Type: Full-Time
    Send resume to Bruce Buntschuh
      Responsibilities:
    Tellme, now a subsidiary of Microsoft, is a company that is focused on delivering the highest quality voice recognition based applications while providing the highest possible automation to its clients. Central to this focus is the speech recognition accuracy and performance that is used by the applications. The candidate will be responsible for the development, performance analysis, and optimization of grammars, as well as overall speech recognition accuracy, in a wide variety of real world applications in all major market segments. This is a unique opportunity to apply and extend state of the art speech recognition technologies to emerging spaces such as information search on mobile devices.
    Requirements:
    · Strong background in engineering, linguistics, mathematics, machine learning, and or computer science.
    · In depth knowledge and expertise in the field of speech recognition.
    · Strong analytical skills with a determination to fully understand and solve complex problems.
    · Excellent spoken and written communication skills.
    · Fluency in English (Spanish a plus).
    · Programming capability with scripting tools such as Perl.
    Education:
    MS, PhD, or equivalent technical experience in an area such as engineering, linguistics, mathematics, or computer science.

    Back to Top
  • Speech Technology and Software Development Engineer at Microsoft Redmond WA, USA

      

    Speech Technology and Software Development Engineer

    Speech Technologies and Modeling

    Speech Component Group

    Microsoft Corporation

    Redmond WA, USA

    Please contact: Yifan.Gong@microsoft.com

    Microsoft's Speech Component Group has been working on automatic speech recognition (SR) in real environments. We develop SR products for multiple languages for mobile devices, desktop computers, and communication servers. The group now has an open position for speech scientists with a software development focus to work on our acoustic and language modeling technologies. The position offers great opportunities for innovation and technology and product development.

    Responsibilities:

    ·     Design and implement speech/language modeling and recognition algorithms to improve recognition accuracy.
    ·     Create, optimize and deliver quality speech recognition models and other components tailored to our customers' needs.
    ·     Identify, investigate and solve challenging problems in the areas of recognition accuracy from speech recognition system deployments.
    ·     Improve speech recognition language expansion engineering process that ensures product quality and scalability.

    Required competencies and skills:

    ·     Passion about speech technology and quality software, demonstrated ability relative to the design and implementation of speech recognition algorithms.
    ·     Strong desire for achieving excellent results, strong problem solving skills, ability to multi-task, handle ambiguities, and identify issues in complex SR systems.
    ·     Good software development skills, including strong aptitude for software design and coding. 3+ years of experience in C/C++ and programming with scripting languages are highly desirable.
    ·     MS or PhD degree in Computer Science, Electrical Engineering, Mathematics, or related disciplines, with strong background in speech recognition technology, statistical modeling, or signal processing.
    ·     Track record of developing SR algorithms, or experience in linguistic/phonetics, is a plus.

     

    Back to Top
  • PhD Research Studentship in Spoken Dialogue Systems- Cambridge UK

    Applications are invited for an EPSRC sponsored studentship in Spoken Dialogue Systems leading to the PhD degree. The student will join a team lead by Professor Steve Young working on statistical approaches to building Spoken Dialogue Systems. The overall goal of the team is to develop complete working end-to-end systems which can be trained from real data and which can be continually adapted on-line. The PhD work will focus specifically on the use of Partially Observable Markov Decision Processes for dialogue modelling and techniques for learning and adaptation within that framework. The work will involve statistical modelling, algorithm design and user evaluation. The successful candidate will have a good first degree in a relevant area. Good programming skills in C/C++ are essential and familiarity with Matlab would be useful.
    The studentship will be for 3 years starting in October 2007 or January 2008. The studentship covers University and College fees at the Home/EU rate and a maintenance allowance of 13000 pounds per annum. Potential applicants should email Steve Young with a brief CV and statement of interest in the proposed work area

    Back to Top
  • Elektrobit seeks SW-Engineers (m/f) for multimodal HMI Solutions (Speech Dialog)

    Elektrobit Automotive Software is located in Erlangen, Germany and delivers ready-to-mass product implementations of a variety of software standards of the automotive industry and services to implement large software projects. The spectrum is enhanced with tools for HMI and control device development and in-house development, such as a navigation solution. We are developing solutions for multimodal HMIs in automotive infotainment/navigation systems. One focus are speech dialog systems. The challenge lies in realizing natural speech dialogue systems for different applications (navigation, mp3 player etc.) in an embedded environment. You will be designing and developing such speech solutions.

    You have know how in one or more of the following areas:
    Experience project co-ordination
    Programming C/C++, perl, (Java) for windows and/or Linux
    Speech recognition
    Multimodal dialog systems
    Speech synthesis /TTS
    SW- Processes and SW-Tests
    Experience in Object oriented Programming
    Experience with Embedded Operating Systems

    Your job description
    Project coordination
    Coordination of supplier and requirements from different applications
    Development and specification of concepts for speech-related SW modules for different applications in embedded environments
    Implementation of multimodal HMIs
    Integration of speech modules in HMIs
    Testing
    We expect from you:
    A degree in IT, electrical/electronic engineering, computational linguistics or similar
    Good working knowledge of German and English
    Innovative streak
    Willingness to take responsibility in international Teams
    We offer you:
    A motivating working environment
    Challenging work
    Support in advancement

    Please apply at www.elektrobit.com -> Automotive Software -> jobs
    If you have any further questions Mr. Schrör (Tel.-Nr. +49 (9131) 7701-516) or Mr. Huck (-217) will gladly answer them.

    Back to Top
  • Sound to Sense: 18 Fellowships in speech research

    Sound to Sense (S2S) is a Marie Curie Research Training Network involving collaborative speech research amongst 13 universities in 10 countries. 18 Training Fellowships are available, of which 12 are predoctoral and 6 postdoctoral (or equivalent experience). Most but not all are planned to start in September or October 2007.
    A research training network's primary aim is to support and train young researchers in professional and inter-disciplinary scientific skills that will equip them for careers in research. S2S's scientific focus is on cross-disciplinary methods for modelling speech recognition by humans and machines. Distinctive aspects of our approach include emphasis on richly-informed phonetic models that emphasize communicative function of utterances, multilingual databases, multiple time domain analyses, hybrid episodic-abstract computational models, and applications and testing in adverse listening conditions and foreign language learning.
    Eleven projects are planned. Each can be flexibly tailored to match the Fellows' backgrounds, research interests, and professional development needs, and will fall into one of four broad themes.
    1: Multilinguistic and comparative research on Fine Phonetic Detail (4 projects)
    2: Imperfect knowledge/imperfect signal (2 projects)
    3: Beyond short units of speech (2 projects)
    4: Exemplars and abstraction (3 projects)
    The institutions and senior scientists involved with S2S are as follows:
    * University of Cambridge, UK (S. Hawkins (Coordinator), M. Ford, M. Miozzo, D. Norris. B. Post)
    * Katholieke Universiteit, Leuven, Belgium (D. Van Compernolle, H. Van Hamme, K. Demuynck)
    * Charles University, Prague, Czech Republic (Z. Palková, T. Dub?da, J. Volín)
    * University of Provence, Aix-en-Provence, France (N. Nguyen, M. d'Imperio, C. Meunier)
    * University Federico II, Naples, Italy (F. Cutugno, A. Corazza)
    * Radboud University, Nijmegen, The Netherlands (L. ten Bosch, H. Baayen, M. Ernestus, C. Gussenhoven, H. Strik)
    * Norwegian University of Science and Technology (NTNU), Trondheim, Norway (W. van Dommelen, M. Johnsen, J. Koreman, T. Svendsen)
    * Technical University of Cluj-Napoca, Romania (M. Giurgiu)
    * University of the Basque Country, Vitoria, Spain (M-L. Garcia Lecumberri, J. Cenoz)
    * University of Geneva, Switzerland (U. Frauenfelder)
    * University of Bristol, UK (S. Mattys, J. Bowers)
    * University of Sheffield, UK (M. Cooke, J. Barker, G. Brown, S. Howard, R. Moore, B. Wells)
    * University of York, UK. (R. Ogden, G. Gaskell, J. Local)
    Successful applicants will normally have a degree in psychology, computer science, engineering, linguistics, phonetics, or related disciplines, and want to acquire expertise in one or more of the others.
    Positions are open until filled, although applications before 1 May 2007 are recommended for starting in October 2007.
    Further details are available from the web about:
    + the research network (92kB) and how to apply, + the research projects(328 kB).

    Back to Top
  • AT&T - Labs Research: Research Staff Positions - Florham Park, NJ

     

    AT&T - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are                 

    • Large-vocabulary automatic speech recognition
    • Acoustic and language modeling
    • Robust speech recognition
    • Signal processing
    • Speaker recognition
    • Speech data mining
    • Natural language understanding and dialog
    • Text and web mining
    • Voice and multimodal search

    AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).

    Back to Top
  • Research Position in Speech Processing at UGent, Belgium

      Background

    Since March 2005, the universities of Leuven, Gent, Antwerp and Brussels have joined forces in a big research project, called SPACE (SPeech Algorithms for Clinical and Educational applications). The project aims at contributing to the broader application of speech technology in educational and therapeutic software tools. More specifically, it pursues the automatic detection and classification of reading errors in the context of an automatic reading tutor, and the objective assessment of disordered speech (e.g. speech of the deaf, dysarthric speech, ...) in the context of computer assisted speech therapy assessment. Specific for the target applications is that the speech is either grammatically and lexically incorrect or a-typically pronounced. Therefore, standard technology cannot be applied as such in these applications.

    Job description

    The person we are looking for will be in charge of the data-driven development of word mispronunciation models that can predict expected reading errors in the context of a reading tutor. These models must be integrated in the linguistic model of the prompted utterance, and achieve that the speech recognizer becomes more specific in its detection and classification of presumed errors than a recognizer which is using a more traditional linguistic model with context-independent garbage and deletion arcs.  A challenge is also to make the mispronunciation model adaptive to the progress made by the user.

    Profile

    We are looking for a person from the EU with a creative mind, and with an interest in speech & language processing and machine learning. The work will require an ability to program algorithms in C and Python. Having experience with Python is not a prerequisite (someone with some software experience is expected to learn this in a short time span). Demonstrated experience with speech & language processing and/or machine learning techniques will give you an advantage over other candidates.

    The job is open to a pre-doctoral as well as a post-doctoral researcher who can start in November or December. The job runs until February 28, 2009, but a pre-doctoral candidate aiming for a doctoral degree will get opportunities to do follow-up research in related projects. 

    Interested persons should send their CV to Jean-Pierre Martens (martens@elis.ugent.be). There is no real deadline, but as soon as a suitable person is found, he/she will get the job.

    Back to Top
  • Summer Inter positions at Motorola Schaumburg Illinois USA

    Motorola Labs - Center for Human Interaction Research (CHIR) located in Schaumburg Illinois, USA, is offering summer intern positions in 2008 (12 weeks each).

    CHIR's mission:

    Our research lab develops technologies that provide access to rich communication, media and information services effortless, based on natural, intelligent interaction. Our research aims on systems that adapt automatically and proactively to changing environments, device capabilities and to continually evolving knowledge about the user.

    Intern profiles:

    1) Acoustic environment/event detection and classification.

    Successful candidate will be a PhD student near the end of his/her PhD study and is skilled in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. Candidates with knowledge of acoustic environment/event classification are preferred.

    2) Speaker adaptation for applications on speech recognition and spoken document retrieval.

    The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete understanding and hand-on experience on automatic speech recognition related research. Proficiency in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker adaptation is highly preferred.

    3) Development of voice search-based web applications on a smartphone

    We are looking for an intern candidate to help create an "experience" prototype based on our voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and rich interaction with web resources. This intern project is oriented more towards software engineering than research. We target an intern with a master's degree and strong software engineering background. Mastery of C++ and experience with web programming (AJAX and web services) is required. Development experience on Windows CE/Mobile desired.

    4) Integrated Voice Search Technology For Mobile Devices.

    Candidate should be proficient in information retrieval, pattern recognition and speech recognition. Candidate should program in C++ and script languages such as Python or Perl in Linux environment. Also, he/she should have knowledge on information retrieval or search engines.

    We offer competitive compensation, fun-to-work environment and Chicago-style pizza.

    If you are interested, please send your resume to:

    Dusan Macho, CHIR-Motorola Labs

    Email:  dusan.macho@motorola.com

    Tel: +1-847-576-6762

    Back to Top
  • Post-doc at France Telcom R&D Lannion Brittany France

    Post-Doc à France Télécom R&d, Lannion : acquisition de contexte à partir de prise de son ambiante.

    DeadLine: 31/12/2007

    claude.marro@orange-ftgroup.com

    Description du contexte

    Des données physiques de toute nature, provenant de l'environnement de l'utilisateur, peuvent être utilisées dans la communication ambiante comme informations de contexte pour offrir des fonctionnalités de service ou d'interface nouvelles, en particulier au niveau de l'adaptation du service à la situation et l'activité des utilisateurs. Ces données sont acquises par divers capteurs répartis dans l'environnement. Les données de scène audio issues de microphones sont parmi les plus riches que l'on puisse exploiter parmi toutes ces données de capteurs, et elles présentent surtout la particularité, dans les applications de communication ambiante, d'être utilisables à la fois comme inputs fonctionnels (communication audio interpersonnelle) et comme inputs de contexte, pour lequel elles peuvent être combinées avec des données issues d'autres types de capteurs. L'objectif ici est de développer des dispositifs permettant d'aboutir à cette double utilisation des données audio. Le système de prise de son doit disparaître de l'attention des utilisateurs et cette dématérialisation en fait la principale difficulté. En effet, l'éloignement de la prise de son entraîne une dégradation de la parole utile et nécessite une localisation de locuteur et une "focalisation" dans sa direction.

    Acquisition et restitution audio fonctionnelle

    Idéalement, l'objectif très ambitieux pourrait être d'obtenir une acquisition et restitution de son qui puisse être efficace quel que soit l'endroit de la pièce où se trouve une personne, en utilisant des microphones répartis dans l'environnement. En raison des difficultés évoquées ci-dessus, remporter ce challenge est hors de portée de cette étude, on proposera comme alternative un dispositif permettant une prise et restitution du son en un nombre limité de points précisés à l'avance.

    Deux approches multi-capteurs sont envisagées : l'antenne acoustique à directivité contrôlée et le microphone ambisonique. L'intérêt d'aborder ces deux techniques réside dans leur complémentarité. En particulier, la première permet un design souple du diagramme de directivité (en fonction de la géométrie, de la fréquence, etc..) et est performante en moyenne et basse fréquence (au détriment de l'encombrement). Quant au microphone ambisonique, il a une taille réduite (au détriment des performances en moyenne et basse fréquence) et permet de reproduire à l'identique un champ acoustique à distance. La première phase de l'étude permettra de déterminer laquelle des approches est la plus adaptée.

    Acquisition de contexte sonore

    La première fonctionnalité à étudier est la localisation de sources sonores, fonction incontournable pour identifier la source vers lequel le système doit pointer. La détection de présence et la position du locuteur sont les informations contextuelles de base à extraire.

    Si l'on considère que la localisation couplée à la prise de son multi-capteur constitue un outil d'analyse du champ sonore, il sera possible d'apporter d'autres informations de contexte. En effet, le développement de traitements spécifiques permettra par exemple de donner le nombre de locuteurs et leurs positions dans la pièce, leur pourcentage de locution, le niveau de bruit de la pièce, etc.

    Une analyse plus fine du contexte sonore est à envisager comme une perspective de ce travail et ne sera abordé que si le temps le permet. Ceci concerne les informations à extraire qui nécessiteraient l'usage de technologies telles que la reconnaissance vocale, la classification et l'indexation audio.

    Notons que le système de prise de son et les traitements développés dans ce projet constitueront des pré-requis nécessaires pour la continuité des travaux sur l'analyse fine du contexte sonore.

    Profil

    Aspects pratiques

    • Aucune condition de nationalité n'est requise
    • Le chercheur bénéficiera d'un contrat à durée déterminée de France Télécom, pour une durée de 12 à 18 mois, non renouvelable ni prolongeable.
    • Le chercheur sera intégré à la division R&D de France Télécom sur son site de Lannion, CRD Technologies, Laboratoire « Speech and Sound technologies and Processing».

    Compétences Techniques

    • Traitement numérique du signal (analyse spectrale, filtrage, etc.)
    • Traitements multi-microphones
    • Si possible bases en traitement de la parole et en acoustique
    • Goût pour les travaux de recherche applicative (analyse, mise au point et adaptation)
    • Langages Matlab et C

    Aptitudes

    - Goût pour le travail en équipe.

    - Bon niveau en anglais.

    Niveau poste : ingénieur en CDD de type post-doctorat - Durée : 12 à 18 mois.

    Back to Top
  • Ph.D. Program CMU-PORTUGAL

    Ph.D. Program CMU-PORTUGAL in the area of Language and Information Technologies

    The Language Technologies Institute (LTI) of the School of Computer Science at Carnegie Mellon University (CMU) offers a dual degree Ph.D.

    Program in Language and Information Technologies in cooperation with Portuguese Universities. This Ph.D. program is part of the activities of the recently created Information and Communication Technologies Institute (ICTI), resulting from the Portugal-CMU Partnership.

    The Language Technologies Institute, a world leader in the areas of speech processing, language processing, information retrieval, machine translation, machine learning, and bio-informatics, has been formed 20 years ago. The breadth of language technologies expertise at LTI enablesnew research in combinations of the core subjects, for example, inspeech-to-speech translation, spoken dialog systems, language-based tutoring systems, and question/answering systems.

    The Portuguese consortium of Universities includes the Spoken LanguageSystems Lab (L2F) of INESC-ID Lisbon/IST, the Center of Linguistics of the University of Lisbon (CLUL/FLUL), the Centre for Human Language Technology and Bioinformatics at the University of Beira Interior(HULTIG/UBI) and the linguistics group at the University of Algarve (UALG). These four research centers (and the corresponding Universities), share expertise in the same language technologies as LTI, although with a strong focus on processing the Portuguese language.

    Each Ph.D. student will receive a dual degree from LTI and the selected Portuguese University, being co-supervised by one advisor from each institute, and spending approximately half of the 5-year doctoral program at each institute. Most of the academic part will take place at LTI, during the first 2 years, where most of the required 8 courses will be taken, with a proper balance of focus areas (Linguistic, Computer Science, Statistical/Learning, Task Orientation). The remaining 3 years of the doctoral program will be dedicated to research, mostly spent at the Portuguese institute, with one or two visits to CMU per year.

    The thesis topic will be in one of the research areas of the cooperation program, defined by the two advisors. Two multilingual topics have been identified as priority research areas: computer aided language learning (CALL) and speech-to-speech machine translation (S2SMT).

    The doctoral students will be involved in one of these two projects aimed at building real HLT systems. These projects will involve at least two languages, one of them being Portuguese, the target language for the CALL system to be developed and either the source or target language (or both) for the S2SMT system. These two projects provide a focus for the proposed research; through them the collaboration will explore the maincore areas in language technology.

    The scholarship will be funded by the Foundation for Science and Technology (FCT), Portugal.

    How to Apply

    The application deadline for all Ph.D. programs in the scope of the CMU-Portugal partnership is December 15, 2007.

    Students interested in the dual doctoral program must apply by filling the corresponding form at the CMU webpage http://www.lti.cs.cmu.edu/About/how-to-apply.html

    The application form will be forwarded to the Portuguese University and to the Foundation for Science and Technology. Simultaneously, they should send an email to the coordinators of the Portuguese consortium and of the LTI admissions (Isabel Trancoso/Lori Levin):

    Isabel.Trancoso@inesc-id.pt

    lsl@cs.cmu.edu

    All questions about the joint degree doctoral program should be directed to these two addresses.

    The applications will be screened by a joint committee formed by representatives of LTI and representatives of the Portuguese Universities involved in the joint degree program. The candidates should indicate their scores in GRE and TOEFL tests.

    Letters of recommendation are due by January 3rd.

    Despite this particular focus on the Portuguese language, applications are not restricted to native or non-native speakers of Portuguese.

    Back to Top
  • Vitaver-Recruiting agency: Sr Manager Audio algorithm development

    Position Code, Title and Location: 1515 - 0AK0 - Sr Manager Audio Algorithm Development - 
    Irvine, CA (Orange County) 
    Start Date: ASAP Remote or Onsite:  On location. No telecommuting or remote work.  
    Travel required: Occasionally. Additional Information: Below is all the information I have 
    from the Client.  Once I setup your interview, you will have the chance to ask them 
    directly anything I do not include here.  
    Description: Our Client is currently seeking a talented Engineering Manager to 
    manage a group of engineers developing and productizing audio algorithms 
    for Bluetooth audio devices (BT headsets, PNDs, car kits, etc) 
    - Work closely with marketing to define the product roadmaps, priorities, and schedules 
    - Work closely with tier 1 customers to describe and promote specific audio algorithms 
    - Work closely with application engineering and customers groups to ensure successful 
    adoption of designed algorithm  
    Requirements:  - Masters/BSc in engineering with a minimum of 10 years of industrial 
    experience in speech and audio processing 
    - Experience with implementation of speech enhancement devices (examples: VoIP, 
    speech compression, echo cancellation, noise suppression, beamforming, blind source 
    separation) on signal processing devices  
    - Experience working closely with marketing to define product directions and priorities 
    - Experience with the definition of requirements on telephony devices (for example, 
    BT headsets, cellular handsets, or cordless telephones) 
    - Some experience with optimization/evaluation of terminal (headset/ handset/ handsfree) 
    acoustics 
     - Experience with programming on RISC CPUs and/or DSPs  
    - Experience with managing groups of engineers 
    - Strong analytical and problem solving skills  
    - Experience with working closely with customers in both a customer support and
     technical marketing role 
     
     Please tell me if you feel confident with the requirements and comfortable delivering 
    on them.  As soon as you send me your reply to all questions below, and your resume 
    attached, we will talk on the phone. A technical interview with the Client will follow 
    as the last step.   
    1.	Are you a US citizen/Green Card/H visa holder (please, specify)? 
    2.	When will you become available? 
    3.	What is the yearly salary you expect? $ _______ per year on 
    W2 (Fulltime Employee)  
    4.	Very important: this is designed to save us both time and expedite a decision 
    by my Client.   
    Please complete the following Skill Matrix, answering each question with 
    A) number of years of experience with the skill, 
    B) your skill level on a scale from 1 to 5 (highest), 
    C) the last time you applied it. It has to be consistent with your resume.   I.e.: 
    Experience 4 years - Skill level 5 - Last Applied January 07  
    1)	Industrial experience in speech and audio processing (min 10 years): __ years; 
    Skill level: __; Last applied: __. 
    2)	Experience implementing speech enhancement devices on signal processing 
    devices: __ years; Skill level: __; Last applied: __. 
    3)	Experience with optimization/evaluation of terminal 
    (headset/ handset/ handsfree) acoustics: __ years; Skill level: __; Last applied: __. 
    4)	Programming on RISC CPUs and/or DSPs: __ years; Skill level: __; 
    Last applied: __. 
    5)	Do you have Bachelor's degree?  
    *   Please attach your resume as a Word document 
     
    Alice Kondrat 
    Senior Staff Recruiter Vitaver & Associates, Inc. 2385 Executive Center Drive 
    #100 Boca Raton, FL 33431 
    Alice@vitaver.com 
    Direct Line: (561) 283-1136 Voice Line: (954) 840-3603 Fax: (866) 259 3777 
    http://www.vitaver.com/ 
     
     
     
         

    Back to Top
  • Nuance: Software engineer speech dialog tools

     

    In order to strengthen our Embedded ASR Research team, we are looking for a:

    SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

    As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

    OVERVIEW:

    - You will work in Nuance's Embedded ASR R&D team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.

    - You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

    - You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.

    - You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.

    - In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.

    QUALIFICATIONS:

    - You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.

    - You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

    - You have experience with Python or other scripting languages.

    - GUI programming experience is a strong asset.

    The following skills are a plus:

    - Understanding of communication protocols

    - Understanding of databases

    - Understanding of computational agents and related frameworks (such as OAA).

    - A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics and machine learning, especially as related to natural language processing, dialogue, and representation of information

    - You can work both as a team player and as goal-oriented independent software engineer.

    - You can work in a multi-national team and communicate effectively with people of different cultures.

    - You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    - You are fluent in English and you can write high quality documentation.

    - Knowledge of other languages is a plus.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

    Deanna Roe                  Deanna.roe@nuance.com

    Please make sure to document to us your excellent software engineering skills.

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • Nuance: Speech scientist London UK

     

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

    To strengthen our International Professional Services team, based in London, we are currently looking for a

                                Speech Scientist, London, UK

    Nuance Professional Services (PS) has designed, developed, and optimized thousands of speech systems across dozens of industries, including directory search, call center automation, applications in telecom, finance, airline, healthcare, and other verticals; applications for video games, mobile dictation, enhanced search services, SMS, and in-car navigation.  Nuance PS applications have automated approximately 7 billion phone conversations for some of the world's most respected companies, including British Airways, Vodafone, Amtrak, Bank of America, BellCanada, Citigroup, General Electric, NTT and Verizon.

    The PS organization consists of energetic, motivated, and friendly individuals.  The Speech Scientists in PS are among the best and brightest, with PhDs from universities such as Cambridge (UK), MIT, McGill, Harvard, Penn, CMU, and Georgia Tech, and having worked at research labs such Bell Labs, Motorola Labs, and ATR (Japan), culminating in over 300 years of Speech Science experience and covering well over 20 languages.

    Come and join Nuance PS and work on the latest technology from one of the prominent speech recognition technology providers, and make a difference in the way the world communicates.

    Job Overview

    As a Speech Scientist in the Professional Services group, you will work on automated speech recognition applications, covering a broad range of activities in all project phases, including the design, development, and optimization of the system.  You will:

    • Work across application development teams to ensure best possible recognition performance in deployed systems
    • Identify recognition challenges and assess accuracy feasibility during the design phase,
    • Design, develop, and test VoiceXML grammars and create JSPs, Java, and ECMAscript grammars for dynamic contexts
    • Optimize accuracy of applications by analyzing performance and tuning statistical language models, pronunciations, and acoustic models, including identifying areas for improvement by running the recognizer offline
    • Contribute to the generation and presentation of client-facing reports
    • Act as technical lead on more intensive client projects
    • Develop methodologies, scripts, procedures that improve efficiency and quality
    • Develop tools and enhance algorithms that facilitate deployment and tuning of recognition components
    • Act as subject matter domain expert for specific knowledge domains
    • Provide input into the design of future product releases

         Required Skills

    • MS or PhD in Computer Science, Engineering, Computational Linguistics, Physics, Mathematics, or related field (or equivalent)
    • Strong analytical and problem solving skills and ability to troubleshoot issues
    • Good judgment and quick-thinking
    • Strong programming skills, preferably Perl or Python
    • Excellent written and verbal communications skills
    • Ability to scope work taking technical, business and time-frame constraints into consideration
    • Works well in a team and in a fast-paced environment

    Beneficial Skills

    • Strong programming skills in either Perl, Python, Java, C/C++, or Matlab
    • Speech recognition knowledge
    • Strong pattern recognition, linguistics, signal processing, or acoustics knowledge
    • Statistical data analysis
    • Experience with XML, VoiceXML, and Wiki
    • Ability to mentor or supervise others
    • Additional language skills, eg French, Dutch, German, Spanish

    Back to Top
  • Nuance: Research engineer speech engine

     

    n order to strengthen our Embedded ASR Research team, we are looking for a:

     RESEARCH ENGINEER SPEECH ENGINE

    As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

     OVERVIEW:

    - You will work in Nuance's Embedded ASR R&D team, developing, improving and maintaining core ASR engine algorithms for our customers in the Automotive and Mobile sector.

    - You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.

    - You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the developing, improving and maintaining core ASR engine algorithms for cutting edge speech and natural language understanding technologies for automotive and mobile devices.

    - You will work on the design and development of more efficient, flexible ASR search algorithms with high focus on low memory and processor requirements.

    QUALIFICATIONS:

    - You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field. PhD is a plus.

    - A background in (computational) linguistics, speech processing, ASR search, confidence values, grammars, statistics and machine learning, especially as related to natural language processing.

    - You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

    The following skills are a plus:

    - You have experience with Python or other scripting languages.

    - Broad knowledge about architectures of embedded platforms and processors.

    - Understanding of databases

    - You can work both as a team player and as goal-oriented independent software engineer.

    - You can work in a multi-national team and communicate effectively with people of different cultures.

    - You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    - You are fluent in English and you can write high quality documentation.

    - Knowledge of other languages is a plus.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

    Deanna Roe                  Deanna.roe@nuance.com

    Please make sure to document to us your excellent software engineering skills.

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • Nuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:

     

    In order to strengthen our Embedded ASR Research team, we are looking for a:

        RESEARCH ENGINEER SPEECH DIALOGUE SYSTEMS

    As part of our team, you will be creating speech technologies for embedded applications varying from simple command and control tasks up to natural language speech dialogues on mobile and automotive platforms.

    OVERVIEW:

    -You will work in Nuance's Embedded ASR research and production team, creating technology, tools and runtime software to enable our customers develop embedded speech applications. In our team of speech and language experts, you will work on natural language dialogue systems that define the state of the art.

    - You will work at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the picturesque city of Ghent, in the Flanders region of Belgium. Ghent has one of the most spectacular historic town centers of Europe and is known for its unique vibrant yet cozy charm, and is home to a large international university.

    - You will work in an international company and cooperate with people on various locations including in Europe, America, and Asia.  You may occasionally be asked to travel.

    RESPONSIBILITIES:

    - You will work on the development of cutting edge natural language dialogue and speech recognition technologies for automotive embedded systems and mobile devices.

    - You will design, implement, evaluate, optimize, and test new algorithms and tools for our speech recognition systems, both for research prototypes and deployed products, including all aspects of dialogue systems design, such as architecture, natural language understanding, dialogue modeling, statistical framework, and so forth.

    - You will help the engine process multi-lingual natural and spontaneous speech in various noise conditions, given the challenging memory and processing power constraints of the embedded world.

    QUALIFICATIONS:

    - You have a university degree in computer science, (computational) linguistics, engineering, mathematics, physics, or a related field. A graduate degree is an asset.

    -You have strong software and programming skills, especially in C/C++, ideally for embedded applications. Knowledge of Python or other scripting languages is a plus. [HQ1] 

    - You have experience in one or more of the following fields:

         dialogue systems

         applied (computational) linguistics

         natural language understanding

         language generation

         search engines

         speech recognition

         grammars and parsing techniques.

         statistics and machine learning techniques

         XML processing

    -You are a team player, willing to take initiative and assume responsibility for your tasks, and are goal-oriented.

    -You can work in a multi-national team and communicate effectively with people of different cultures.

    -You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

    -You are fluent in English and you can write high quality documentation.

    -Knowledge of other languages is a strong asset.

    CONTACT:

    Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

     

    Deanna Roe                  Deanna.roe@nuance.com

    ABOUT US:

    Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

     

    Back to Top
  • CNRS tenure research position at LIMSI

    CNRS tenure research position (CR2) at LIMSI

    http://www.limsi.fr/tlp/postes07.html

    [Application deadline 09-JAN-08 23:59 Paris time]

    CR2 position 48/03 in section 48 (Communication sciences)

    Information systems, multimedia and multilingual documents, knowledge

    industry.

    Job description

    The candidate will conduct research in the field of information

    processing with application to multimedia and multilingual

    documents. The main research direction will be the development of

    models and algorithms for automatic structuring, indexing, enrichment

    and extraction of knowledge from speech data in a multilingual

    context. It is envisioned that the models studied will be based on the

    joint use of linguistic knowledge and statistical learning methods

    applied to very large corpora. The candidate should have an expertise

    in at least one of following areas: spoken language processing,

    statistical based machine translation, and audio data mining. This

    research will contribute to the development of emerging technologies

    for knowledge engineering and information access. A variety of new

    applications can be foreseen related to audiovisual archives, digital

    libraries, the production of multimedia documents, and more generally

    systems to organize and access multimedia and multilingual content,

    especially on the Internet.

    Descriptif du poste

    Le titulaire du poste mènera ses recherches dans le domaine du

    traitement automatique de l'information avec pour champ applicatif les

    documents multimedia et multilingues. La principale direction de

    recherche envisagée est le développement de modèles et d'algorithmes

    pour la structuration automatique, l'indexation, l'enrichissement et

    l'extraction de connaissances à partir de la parole, dans un contexte

    multilingue. Il est envisagé que les modèles étudiés reposeront sur

    l'utilisation conjointe de connaissances linguistiques et de méthodes

    d'apprentissage faisant usage de grandes masses de données. Le

    candidat aura une expertise au moins sur l'une des problématiques

    suivantes: le traitement du langage parlé, la traduction par

    apprentissage statistique à base de données, et la fouille de données

    audio. Ces recherches contribueront au développement de systèmes

    d'information et de nouveaux usages. Ces usages concernent les

    archives audiovisuelles, les bibliotheques numériques, la production

    de documents multimedia, et plus généralement les systèmes de gestion

    et d'accès aux contenus multimédias et multilingues, en particulier

    sur l'internet.

    Back to Top
  • Research Position in Speech Processing at Nagoya Institute of

     

    Research Position in Speech Processing at Nagoya Institute of

    Technology, Japan

    Nagoya Institute of Technology is seeking a researcher for a

    post-doctoral position in a new European Commission-funded project

    EMIME ("Efficient multilingual interaction in mobile environment")

    involving Nagoya Institute of Technology and other five European

    partners, starting in March 2008 (see the project summary below).

    The earliest starting date of the position is March 2007. The initial

    duration of the contract will be one year, with a possibility for

    prolongation (year-by-year basis, maximum of three years). The

    position provides opportunities to collaborate with other researchers

    in a variety of national and international projects. The competitive

    salary is calculated according to qualifications based on NIT scales.

    The candidate should have a strong background in speech signal

    processing and some experience with speech synthesis and recognition.

    Desired skills include familiarity with latest spectrum of technology

    including HTK, HTS, and Festival at the source code level.

    For more information, please contact Keiichi Tokuda

    (http://www.sp.nitech.ac.jp/~tokuda/).

     

    About us

    Nagoya Institute of Technology (NIT), founded on 1905, is situated in

    the world-quality manufacturing area of Central Japan (about one hour

    and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).

    NIT is a highest-level educational institution of technology and is

    one of the leaders of such institutions in Japan. EMIME will be

    carried at the Speech Processing Laboratory (SPL) in the Department of

    Computer Science and Engineering of NIT. SPL is known for its

    outstanding, continuous contribution of developing high-performance,

    high-quality opensource software: the HMM-based Speech Synthesis

    System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary

    continuous speech recognition engine "Julius"

    (http://julius.sourceforge.jp/), and the Speech Signal Processing

    Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is

    involved in numerous national and international collaborative

    projects. SPL also has close partnerships with many industrial

    companies, in order to transfer its research into commercial

    applications, including Toyota, Nissan, Panasonic, Brother Inc.,

    Funai, Asahi-Kasei, ATR.

    Project summary of EMIME

    The EMIME project will help to overcome the language barrier by

    developing a mobile device that performs personalized speech-to-speech

    translation, such that a user's spoken input in one language is used

    to produce spoken output in another language, while continuing to

    sound like the user's voice. Personalization of systems for

    cross-lingual spoken communication is an important, but little

    explored, topic. It is essential for providing more natural

    interaction and making the computing device a less obtrusive element

    when assisting human-human interactions.

    We will build on recent developments in speech synthesis using hidden

    Markov models, which is the same technology used for automatic speech

    recognition. Using a common statistical modeling framework for

    automatic speech recognition and speech synthesis will enable the use

    of common techniques for adaptation and multilinguality.

    Significant progress will be made towards a unified approach for

    speech recognition and speech synthesis: this is a very powerful

    concept, and will open up many new areas of research. In this

    project, we will explore the use of speaker adaptation across

    languages so that, by performing automatic speech recognition, we can

    learn the characteristics of an individual speaker, and then use those

    characteristics when producing output speech in another language.

    Our objectives are to:

    1. Personalize speech processing systems by learning individual

    characteristics of a user's speech and reproducing them in

    synthesized speech.

    2. Introduce a cross-lingual capability such that personal

    characteristics can be reproduced in a second language not spoken

    by the user.

    3. Develop and better understand the mathematical and theoretical

    relationship between speech recognition and synthesis.

    4. Eliminate the need for human intervention in the process of

    cross-lingual personalization.

    5. Evaluate our research against state-of-the art techniques and in a

    practical mobile application.

    Back to Top
  • PhD's fellowships at NTNU Trondheim-Norway

    Two Ph.D. fellowships available at  

    Department of Electronics and Telecommunications

    The Norwegian University of Science and Technology ( NTNU)

    Trondheim, Norway


    Closing date: January 27, 2008


     

    The Ph.D. scholarships are part of the projects SIRKUS (Spoken Information Retrieval by Knowledge Utilization in Statistical Speech Processing) and SMUDI (Voice Control in Multi-Modal Dialogue).

    PhD Fellowship in the SIRKUS project:
    The vision of SIRKUS is to develop new architectures for speech recognition that can achieve human-like performance. There is evidence indicating that system performace based on the current paradigm is reaching an upper bound. We believe that in order to achieve our vision, knowledge about speech production and perception as well as a better understanding of speech per se needs to be incorporated into a statistical framework. This implies that new approaches to speech analysis need to be investigated and developed, and that a new statistical framework for modelling speech based on a set of analysis results needs to be defined.

    The PhD project
    An on-going PhD project will investigate and develop speech analysis methods that will supplement and augment the current methods that are based on spectral analysis. In particular, development of detectors of phonologically significant events, i.e. speech attributes, will be central. A set of speech analyses, including speech attribute detectors, will produce speech event sequences, which can constitute temporally asynchronous observation streams and may be correlated both in time and across observation streams.
    This  PhD project will in cooperation with the on-going project develop novel statistical modelling approaches suitable for modelling speech based on event sequences. Integration of the observation streams from the speech analyses in a statistical description of various linguistic units such as phonemes, syllables, words and sentences will be central.
    The project will incorporate collaboration with several foreign partners. The scholarships will also include an international visiting research scholarship.

    Qualifications:
    We seek highly motivated individuals holding a masters degree in electronics engineering, signal processing, statistics, or other relevant disciplines. Experience in speech technology is desirable, but not an absolute requirement.

    PhD Fellowship in the SMUDI project:
    Large groups of disabled persons have great difficulties accessing information that is available on the internet. Many government and municipal agencies are in the process of changing their preferred interaction with the public, moving to internet based systems for submission of applications and information requests. In many cases it will be necessary for the user to provide information by filling in forms. Some examples are online shopping for goods and services, e.g. air travel and use of public services such as filling in tax return forms.

    The PhD project
    The goal of the PhD project is to develop a speech based system for filling in internet forms. The work will include methods for interpretation of the internet forms, speech recognition to transform the user speech to text, and integration of speech tehnology with other modalities for information presentation and user input.

    In order to achieve a best possible performance for this task, the speech recognition ought to be at the user's computer. This will have the advantage that the speech recognizer can be adapted to the user's voice and pronunciation, and in addition provide a situation where the speech signal is not band limited or noise corrupted by the transmission channel. The challenge is that the recognizer cannot be tailored for filling in one particular form, and thus it will be necessary to develop a general large vocabulary speech recognizer (a "dictation engine") for Norwegian. The system should be able to define vocabulary and syntax dynamically by interpreting the content of the internet form.

    Qualifications:
    We seek highly motivated individuals holding a masters degree in electronics engineering, signal processing, statistics, natural language processing or other relevant disciplines. Experience in speech technology is desirable, but not an absolute requirement.


    Information for both fellowships:

    The PhD fellows will be associated with the Signal Processing Group at  NTNU and will work in a strong and active scientific environment.

    Award holders at NTNU are normally appointed for up to 4 years with 25% of the time spent on specified work. This work is primarily linked to teaching and is usually divided so that a relatively large part is done in the first half of the period of the appointment.
    The appointments is at code 1017, salary level 43-58 in the national salary scheme, gross NOK. 325.600 - 423.800 per annum (1 NOK ~ 0.125 EUR), and are normally remunerated at wage level 43 of which 2% is deducted for the State Pension scheme. The salary might be adjusted after negotiation with the employer to reflect the applicant's experience.

    Further information:
    For more information on the SIRKUS position and the application requirements, please see http://www.iet.ntnu.no/projects/SIRKUS/Positions.html
    For more information on the SMUDI position and the application requirements, please see http://www.iet.ntnu.no/projects/SMUDI/Positions.html

    Interested candidates are encouraged to contact Professor Torbjørn Svendsen (phone: +47-735-92674, email: torbjorn@iet.ntnu.no) for further information.

    Back to Top
  • Research position in Speech Recognition in the context of Spoken Document Retrieval at the University of Twente (NL)

    Research position in Speech Recognition in the context of Spoken Document

    Retrieval

    The Human Media Interaction (HMI) group of the Computer Science department

    at the University of Twente in Enschede, The Netherlands, has funding for a

    (junior) research position in the area of speech recognition; the

    application domain is multimedia information retrieval. We are looking for a

    candidate interested in research and development work in speech recognition

    for surprise data, data with unknown or highly fluctuating

    characteristics, both at the acoustic level and word level, as for example

    occurring in historical archives or commercials. Initially the position will be available for one year with an opportunity for continuation.

    Requirements

    - affinity with speech recognition technology demonstrated by an academic

    degree in e.g., computational linguistics, computer science, phonetics

    - provable knowledge/experience in at least two of the following areas:

    speech recognition, NLP, DSP, machine-learning

    - good scripting skills (Perl and alike); skills in C/C++ is regarded as a pro

    - comfortable with Linux environment

    - willing and comfortable to work in a team

    Conditions of employment

    Depending on demonstrable experience, gross salary starts with EUR 2.217,-

    per month in the first year ((level 10, CAO Nederlandse Universiteiten)

    Applications (in English or Dutch) should be sent to:

    Prof. Dr. Franciska de Jong

    HMI, Faculty of EEMCS

    University of Twente,

    P.O. Box 217

    7500 AE Enschede

    The Netherlands

    fdejong@ewi.utwente.nl

    Make sure to include the following:

    - a letter of application with a motivation

    - a curriculum vitae (including a list of publications and previous project=

    s

    worked on)

    - the names and email addresses of two referents

    Deadline: Februari 1, 2008.

    For more information on the HMI research group, and its projects see

    http://hmi.ewi.utwente.nl or contact

    Prof. Dr. Franciska de Jong (fdejong@ewi.utwente.nl) or

    Dr. Roeland Ordelman (ordelman@ewi.utwente.nl)

    Back to Top
  • C/C++ Programmer Munich, Germany

    Digital publishing AG is one of Europe's leading producers of  interactive software for foreign language training. In our e- learning courses we want to place the emphasis on speaking and  spoken language understanding.  In order to strengthen our Research & Development Team in Munich,  Germany, we are looking for experienced C or C++ programmers with  at least 3 years experience in the design and coding of  sophisticated software systems under Windows.   
    We offer   
    -a creative working atmosphere in an international team of   software engineers, linguists and editors working on    challenging research projects in speech recognition and    speech dialogue systems  
    - participation in all phases of a product life cycle, as we    are interested in the fast transfer of research results    into products.  
    - the possibility to participate in international scientific    conferences.   
    - a permanent job in the center of Munich.  
    - excellent possibilities for development within our fast    growing company.    
    - flexible working times, competitive compensation and    arguably the best espresso in Munich.   
    We expect  
    -several years of practical experience in software    development in C or C++ in a commercial or academic    environment.  
    -experience with parallel algorithms and thread    programming.  
    -experience with object-oriented design of software    systems.  
    -good knowledge of English or German.   
    Desirable is  
    -experience with optimization of algorithms.  
    -experience in statistical speech or language    processing, preferably speech recognition, speech    synthesis, speech dialogue systems or chatbots.  
    -experience with Delphi or Turbo Pascal.   
    Interested? We look forward to your application:  (preferably by e-mail)   
    digital publishing AG  
    Freddy Ertl  f.ertl@digitalpublishing.de  
    Tumblinger Straße 32  
    D-80337 München Germany 

    Back to Top
  • Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA

     

    Speech and Natural Language Processing Engineer


    M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


    We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

    Responsibilities:

    • You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
    • You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
    • You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.

     

    Requirements:

    • Solid background in speech recognition, natural language processing, machine learning and information extraction.
    • 2+ years of experience participating in software development projects
    • Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
    • Excellent analytical and problem-solving skills
    • Integrate and communicate well in small R&D teams
    • Masters degree in CS or related engineering fields
    • Experience in a healthcare-related field a plus

     

    In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

     

    M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com. 

     

    Back to Top
  • Senior Research Scientist -- Speech and Natural Lgage Processing at M*Modal, Pittsburgh, PA,USA

     

    Senior Research Scientist -- Speech and Natural Language Processing


    M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


    We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

    Responsibilities: