Contents
- 1 . Message from the board
- 2 . Editorial
- 3 . ISCA News
- 4 . SIG's activities
- 5 .
Future ISCA Conferences and workshops (ITRW)
- 5-1 . INTERSPEECH 2008
- 5-2 . INTERSPEECH 2009
- 5-3 . INTERSPEECH 2010
- 5-4 . ITRW on Speech analysis and processing for knowledge discovery
- 5-5 . ITRW on experimental linguistics
- 5-6 . International Conference on Auditory-Visual Speech Processing AVSP 2008
- 5-7 . Christian Benoit workshop on Speech and Face to Face Communication
- 6 .
Books, databases and softwares
- 6-1 . Books
- 6-2 . LDC News
- 6-3 . Question Answering on speech transcripts (QAst)
- 6-4 . ELRA - Language Resources Catalogue - Update
- 7 .
Job openings
- 7-1 . AT&T - Labs Research: Research Staff Positions - Florham Park, NJ
- 7-2 . Summer Inter positions at Motorola Schaumburg Illinois USA
- 7-3 . Nuance: Software engineer speech dialog tools
- 7-4 . Nuance: Speech scientist London UK
- 7-5 . Nuance: Research engineer speech engine
- 7-6 . Nuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:
- 7-7 . Research Position in Speech Processing at Nagoya Institute of Technology,Japan
- 7-8 . C/C++ Programmer Munich, Germany
- 7-9 . Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA
- 7-10 . Senior Research Scientist -- Speech and Natural Language Processing at M*Modal, Pittsburgh, PA,USA
- 7-11 . Postdoc position at LORIA, Nancy, France
- 7-12 . Internships at Motorola Labs Schaumburg
- 7-13 . Masters in Human Language Technology
- 7-14 . PhD positions at Supelec, Paris
- 7-15 . Speech Faculty Position at CMU, Pittsburgh, Pensylvania
- 7-16 . Opened positions at Microsoft: Danish Linguist (M/F)
- 7-17 . Opened positions at Microsoft: Swedish Linguist (M/F)
- 7-18 . Opened positions at Microsoft: Dutch Linguist (M/F)
- 7-19 . PhD position at Orange Lab
- 7-20 . Social speech scientist at Wright Patterson AFB, Ohio, USA
- 7-21 . Professeur a PHELMA du Grenoble INP (in french)
- 7-22 . POSTDOCTORAL FELLOWSHIP OPENING AT ICSI Berkeley
- 7-23 . PhD positions at GIPSA (formerly ICP) Grenoble France
- 7-24 . PhD in speech signal processing at Infineon Sophia Antipolis
- 7-25 . PhD position at Institut Eurecom Sophia Antipolis France
- 8 .
Journals
- 8-1 . Papers accepted for FUTURE PUBLICATION in Speech Communication
- 8-2 . Special Issue on Non-Linear and Non-Conventional Speech Processing-Speech Communication
- 8-3 . Journal of Multimedia User Interfaces
- 8-4 . CURRENT RESEARCH IN PHONOLOGY AND PHONETICS: INTERFACES WITH NATURAL LANGUAGE PROCESSING
- 8-5 . IEEE Signal Processing Magazine: Special Issue on Digital Forensics
- 8-6 . Special Issue on Integration of Context and Content for Multimedia Management
- 8-7 . CfP Speech Communication: Special Issue On Spoken Language Technology for Education
- 8-8 . CfP Special Issue on Processing Morphologically Rich Languages IEEE Trans ASL
- 9 . Forthcoming events supported (but not organized) by ISCA
- 10 .
Future Speech Science and Technology Events
- 10-1 . Call for participation INFILE@CLEF2008 Evaluation
- 10-2 . Christian Benoit Workshop on Speech and Face to Face Communication
- 10-3 . Seminaires du Dpt Parole et Cognition du GIPSA (ex ICP Grenoble) (in french)
- 10-4 . Coscoda/Write Workshop:_ Towards common priorities and recommendations for the future of Language and Speech Resources
- 10-5 . LREC 2008 - 6th Language Resources and Evaluation Conference
- 10-6 . Collaboration: interoperability between people in the creation of language resources for less-resourced languages
- 10-7 . 2nd Intl Workshop on emotion corpora for research on emotion and affect
- 10-8 . ELRA Workshop on Evaluation
- 10-9 . HLT and NLP within the Arabic world
- 10-10 . JEP/TALN/RECITAL 2008 - Avignon France
- 10-11 . 6th Intl Conference on Content-based Multimedia Indexing CBMI '08
- 10-12 . IIS2008 Workshop on Spoken Language and Understanding and Dialog Systems
- 10-13 . HLT Workshop on Mobile Language Technology (ACL-08)
- 10-14 . 4TH TUTORIAL AND RESEARCH WORKSHOP PIT08
- 10-15 . YRR-2008 Young Researchers' Roundtable
- 10-16 . Summer school New Trends in Pattern Recognition for Language Bilbao Spain
- 10-17 . SIGIR 2008 workshop: Searching Spontaneous Conversational Speech
- 10-18 . 2nd Workshop on Analytics for Noisy Unstructured Text Data
- 10-19 . eNTERFACE 2008 Orsay Paris
- 10-20 . 2nd IEEE Intl Conference on Semantic Computing
- 10-21 . EUSIPCO-2008 - 16th European Signal Processing Conference - Lausanne Switzerland
- 10-22 . 5th Joint Workshop on Machine Learning and Multimodal Interaction MLMI 2008
- 10-23 . TDS 2008 11th Int.Conf. on Text, Speech and Dialogue
- 10-24 . Third Workshop on Speech in Mobile and Pervasive Environments
- 10-25 . 50th International Symposium ELMAR-2008
- 10-26 . 2008 International Workshop on Multimedia Signal Processing
- 10-27 . 2008 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING
- 10-28 . CALL FOR PAPERS - The 9th International Conference on Signal Processing
- 10-29 . 8th International Seminar on Speech Production - ISSP 2008
- 10-30 . 1st CfP 5th International MultiMedia Modeling Conference (MMM2009)
2 . Editorial
Dear Members,
You will find hereunder the May issue of ISCApad. Some of you complain about the size of the last issues. Our aim is to provide the most exhaustive information. I fully agree that this information is available on the web but in ISCApad it is gathered in a single place. We are still working to create a more compact version pointing towards information available on our web. We remain convinced that pushed information is more read. Thanks to all of you who devote some minutes to send their opinion about ISCApad: positive and negative comments are both encouraging. I am just disappointed by indifference.Chris Wellekens
Institut Eurecom
Sophia Antipolis France
3 . ISCA News
3-1 . ISCA Scientific Achievement Medalist 2008
ISCA Scientific Achievement Medal for 2008 It is with great pleasure that I announce the ISCA Medalist for 2008 - Hiroya Fujisaki. Prof. Fujisaki has contributed to the speech research community in so many aspects, in speech analysis, synthesis and prosody, that it will be a very hard task for me to summarize his long list of achievements. He is also the founder of the ICSLP series of conferences which, being now fully integrated as one of ISCA's yearly conferences, will have its 10th anniversary this year.
3-2 . ISCA Fellows
ISCA Fellows, Call for Nominations
In 2007, ISCA will begin its Fellow Program to recognize and honor outstanding members who have made significant contributions to the field of speech science and technology. To qualify for this distinction, a candidate must have been an ISCA member for five years or more with a minimum of ten years experience in the field. Nominations may be made by any ISCA member (see Nomination Form). The nomination must be accompanied by references from three current ISCA Fellows (or, during the first three years of the program, by ISCA Board members). A Fellow may be recognized by his/her outstanding technical contributions and/or continued significant service to ISCA. The candidate's technical contribution should be summarized in the nomination in terms of publications, patents, projects, prototypes and their impact in the community.
Fellows will be selected by a Fellow Selection Committee of nine members who each serve three-year terms. In the first year of the program, the Committee will be formed by ISCA Board members. Over the next three years, one third of the members of the Selection Committee will be replaced by ISCA Fellows until the Committee consists entirely of ISCA Fellows. Members of the Committee will be chosen by the ISCA Board.
The committee will hold a virtual meeting during June to evaluate the current years nominations.
Nominations should be submitted on the form provided at http://www.isca-speech.org/fellows.html. Nominations should be submitted before May 23rd 2008.
4 . SIG's activities
A list of Speech Interest Groups can be found on our web.
Future ISCA conferences and workshops
INTERSPEECH 2008
September 22-26, 2008, Brisbane, Queensland, Australia
Back to Top
Conference Website
Chairman: Denis Burnham, MARCS, University of West Sydney.INTERSPEECH 2009
Brighton, UK,
Back to Top
Conference Website
Chairman: Prof. Roger Moore, University of Sheffield.INTERSPEECH 2010
Chiba, Japan
Conference Website
ISCA is pleased to announce that INTERSPEECH 2010 will take place in Makuhari-Messe, Chiba, Japan, September 26-30, 2010. The event will be chaired by Keikichi Hirose (Univ. Tokyo), and will have as a theme "Towards Spoken Language Processing for All - Regardless of Age, Health Conditions, Native Languages, Environment, etc."
Back to TopITRW on Evide\nce-based Voice and Speech Rehabilitation in Head & Neck Oncology
ISCA Workshop
Evidence-based Voice and Speech Rehabilitation in Head & Neck Oncology
Amsterdam, May 15-16, 2008
Evidence-based Voice and Speech Rehabilitation is of increasing relevance in Head & Neck Oncology. The number of patients requiring treatment for cancer in the upper respiratory and vocal tract keeps rising. Moreover, treatment - whether it concerns an "organ preservation protocol" or traditional surgery and radiotherapy - negatively impacts the function of organs vital for communication. A "function preservation treatment" does, unfortunately, not yet exist. This workshop seeks to assemble the latest and most relevant knowledge on evidence-based voice and speech rehabilitation. Aside from the main topic (voice and speech rehabilitation after total laryngectomy), other areas, such as vocal issues in early-stage larynx carcinoma, and various stages of oral / oropharyngeal carcinoma will be addressed.
The workshop comprises four topical sessions (see below). Each session includes two keynote lectures plus a round-table discussion and (maximally 10) poster presentations pertinent to the session's topic. A work document, based on the keynote lectures, will form the basis for each round-table discussion. This work document will contain all presently available research evidence, discuss its (clinical) relevance and will formulate directions and areas of interest for future research. The keynote lectures, work documents and poster papers are to be compiled into Workshop Proceedings, and will be published under ISCA flag (website: http://www.isca-speech.org/). It is our aim to make these Proceedings available at the workshop. This will result in a useful and traceable ‘State of the Art' handbook/CD/web publication.
Prof. Dr. Frans JM Hilgers
Prof. Dr. Louis CW Pols
Dr. Maya van Rossum
Venue:
Tinbergen lecture hall, Royal Netherlands Academy of Arts and Sciences. Kloveniersburgwal 29, Amsterdam
More information can be obtained from the website www.fon.hum.uva.nl/webhnr/
Organization:
Prof. Dr. Frans JM Hilgers
Prof. Dr. Louis CW Pols
Dr. Maya van Rossum
Institute of Phonetic Sciences - Amsterdam Center for Language and Communication, University of Amsterdam
Department of Head and Neck Oncology and Surgery
The Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital
Department of Otolaryngology, Academic Medical Center, University of Amsterdam
International Faculty
Prof. Philip C Doyle, PhD University of Western Ontario, London, Canada
Prof. Tanya L Eadie, PhD University of Washington, Seattle, USA
Prof. Dr. Dr. Ulrich Eysholdt University of Erlangen-Nuremberg, Germany
Prof. Britta Hammarberg, PhD Karolinska University, Stockholm, Sweden
Prof. Jeffrey P Searle, PhD University of Kansas, Kansas City, USA
Local Faculty
Dr. Annemieke H Ackerstaff 2
Dr. Corina J van As-Brooks 2
Dr. Michiel WM van den Brekel 2,3
Prof. Dr. Frans Hilgers 1,2, 3
Petra Jongmans, MA 1, 2
Lisette van der Molen, MA 2
Prof. Dr. Louis CW Pols 1
Dr. Maya van Rossum 2, 4
Dr. Irma M Verdonck-de Leeuw 5
1 Institute of Phonetic Sciences/Amsterdam Center of Language and Communication, University of Amsterdam
2 The Netherlands Cancer Institute, Amsterdam
3 Academic Medical Center, University of Amsterdam
4 University Medical Center Leiden
5 Free University Medical Center, Amsterdam
Course secretariat: Mrs. Marion van Zuilen
The Netherlands Cancer Institute
Plesmanlaan 121 1066CX Amsterdam, The Netherlands
Telephone +3120-512-2550; Fax +3120-512-2554
e-mail to f.hilgers@nki.nl or kno@nki.nl
Back to TopITRW on Speech analysis and processing for knowledge discovery
June 4 - 6, 2008
Back to Top
Aalborg, Denmark
Workshop website
Humans are very efficient at capturing information and messages in speech, and they often perform this task effortlessly even when the signal is degraded by noise, reverberation and channel effects. In contrast, when a speech signal is processed by conventional spectral analysis methods, significant cues and useful information in speech are usually not taken proper advantage of, resulting in sub-optimal performance in many speech systems. There exists, however, a vast literature on speech production and perception mechanisms and their impacts on acoustic phonetics that could be more effectively utilized in modern speech systems. A re-examination of these knowledge sources is needed. On the other hand, recent advances in speech modelling and processing and the availability of a huge collection of multilingual speech data have provided an unprecedented opportunity for acoustic phoneticians to revise and strengthen their knowledge and develop new theories. Such a collaborative effort between science and technology is beneficial to the speech community and it is likely to lead to a paradigm shift for designing next-generation speech algorithms and systems. This, however, calls for a focussed attention to be devoted to analysis and processing techniques aiming at a more effective extraction of information and knowledge in speech.
Objectives:
The objective of this workshop is to discuss innovative approaches to the analysis of speech signals, so that it can bring out the subtle and unique characteristics of speech and speaker. This will also help in discovering speech cues useful for improving the performance of speech systems significantly. Several attempts have been made in the past to explore speech analysis methods that can bridge the gap between human and machine processing of speech. In particular, the time varying aspects of interactions between excitation and vocal tract systems during production seem to elude exploitation. Some of the explored methods include all-pole and polezero modelling methods based on temporal weighting of the prediction errors, interpreting the zeros of speech spectra, analysis of phase in the time and transform domains, nonlinear (neural network) models for information extraction and integration, etc. Such studies may also bring out some finer details of speech signals, which may have implications in determining the acoustic-phonetic cues needed for developing robust speech systems.
The Workshop:
G will present a full-morning common tutorial to give an overview of the present stage of research linked to the subject of the workshop
G will be organised as a single series of oral and poster presentations
G each oral presentation is given 30 minutes to allow for ample time for discussion
G is an ideal forum for speech scientists to discuss the perspectives that will further future research collaborations.
Potential Topic areas:
G Parametric and nonparametric models
G New all-pole and pole-zero spectral modelling
G Temporal modelling
G Non-spectral processing (group delay etc)
G Integration of spectral and temporal processing
G Biologically-inspired speech analysis and processing
G Interactions between excitation and vocal tract systems
G Characterization and representation of acoustic phonetic attributes
G Attributed-based speaker and spoken language characterization
G Analysis and processing for detecting acoustic phonetic attributes
G Language independent aspects of acoustic phonetic attributes detection
G Detection of language-specific acoustic phonetic attributes
G Acoustic to linguistic and acoustic phonetic mapping
G Mapping from acoustic signal to articulator configurations
G Merging of synchronous and asynchronous information
G Other related topics
Call for papers. Notification of review:
The submission deadline is edxtended to February 14, 2008.
Registration
Fees for early and late registration for ISCA and non-ISCA members will be made available on the website during September 2007.
Venue:
The workshop will take place at Aalborg University, Department of Electronic Systems, Denmark. See the workshop website for further and latest information.
Accommodation:
There are a large number of hotels in Aalborg most of them close to the city centre. The list of hotels, their web sites and telephone numbers are given on the workshop website. Here you will also find information about transportation between the city centre and the university campus.
How to reach Aalborg:
Aalborg Airport is half an hour away from the international Copenhagen Airport. There are many daily flight connections between Copenhagen and Aalborg. Flying with Scandinavian Airlines System (SAS) or one of the Star Alliance companies to Copenhagen enables you to include Copenhagen-Aalborg into the entire ticket, and this way reducing the full transportation cost. There is also an hourly train connection between the two cities; the train ride lasts approx. five hours
Organising Committee:
Paul Dalsgaard, B. Yegnanarayana, Chin-Hui Lee, Paavo Alku, Rolf Carlson, Torbjørn Svendsen,
Important dates
Submission of full and final: January 31, 2008 on the Website
http://www.es.aau.dk/ITRW/
Notification of review results: No later than March 30., 2008.ITRW on Experimental Linguistics
August 2008, Athens, Greece
Back to Top
Website
Prof. Antonis BotinisITRW on Auditory-Visual Speech Processing AVSP 2008
International Conference on Auditory-Visual Speech Processing AVSP 2008
Dates: 26-29 September 2008
Location: Moreton Island, Queensland, Australia
Website: http://express.hid.ri.cmu.edu/AVSP2008/Main.html
AVSP 2008 will be held as an ISCA Tutorial and Research Workshop at
Tangalooma Wild Dolphin Resort on Moreton Island from the 26-29
September 2008. AVSP 2008 is a satellite conference to Interspeech 2008,
being held in Brisbane from the 22-26 September 2008. Tangalooma is
located at close distance from Brisbane, so that attendance at AVSP 2008
can easily be combined with participation in Interspeech 2008.
Auditory-visual speech production and perception by human and machine is
an interdisciplinary and cross-linguistic field which has attracted
speech scientists, cognitive psychologists, phoneticians, computational
engineers, and researchers in language learning studies. Since the
inaugural workshop in Bonas in 1995, Auditory-Visual Speech Processing
workshops have been organised on a regular basis (see an overview at the
avisa website). In line with previous meetings, this conference will
consist of a mixture of regular presentations (both posters and oral),
and lectures by invited speakers.
Topics include but are not limited to:
- Machine recognition
- Human and machine models of integration
- Multimodal processing of spoken events
- Cross-linguistic studies
- Developmental studies
- Gesture and expression animation
- Modelling of facial gestures
- Speech synthesis
- Prosody
- Neurophysiology and neuro-psychology of audition and vision
- Scene analysis
Paper submission:
Details of the paper submission procedure will be available on the
website in a few weeks time.
Chairs:
Simon Lucey
Roland Goecke
Patrick Lucey
Back to TopITRW on Robust ASR
Santiago, Chile
October-November 2008
Dr. Nestor Yoma
Back to Top
Books, Databases, Softwares
Reviewing a book?
The author of the book Advances in Digital Speech Transmission told me that you might be interested in doing a review of her book. If so I would be pleased to send you a free review copy. Please just answer to this email and let me know the address where I can send to book to.
Back to Top
Martin, Rainer / Heute, Ulrich / Antweiler, Christiane
Advances in Digital Speech Transmission
1. Edition - January 2008
99.90 Euro
2008. 572 Pages, Hardcover
- Practical Approach Book -
ISBN-10: 0-470-51739-5
ISBN-13: 978-0-470-51739-0 - John Wiley & Sons
Best regards
Tina Heuberger
----------------------------------------------------
Public Relations Associate
Physical Sciences and Life Sciences Books
Wiley-Blackwell
Wiley-VCH Verlag GmbH & Co. KGaA
Boschstr. 12
69469 Weinheim
Germany
phone +49/6201/606-412
fax +49/6201/606-223
mailto:theuberger@wiley-vch.deBooks
La production de la parole
Author: Alain Marchal, Universite d'Aix en Provence, France
Publisher: Hermes Lavoisier
Year: 2007Speech enhancement-Theory and Practice
Author: Philipos C. Loizou, University of Texas, Dallas, USA
Publisher: CRC Press
Year:2007Speech and Language Engineering
Editor: Martin Rajman
Publisher: EPFL Press, distributed by CRC Press
Year: 2007Human Communication Disorders/ Speech therapy
This interesting series can be listed on Wiley websiteIncurses em torno do ritmo da fala
Author: Plinio A. Barbosa
Publisher: Pontes Editores (city: Campinas)
Year: 2006 (released 11/24/2006)
(In Portuguese, abstract attached.) WebsiteSpeech Quality of VoIP: Assessment and Prediction
Author: Alexander Raake
Publisher: John Wiley & Sons, UK-Chichester, September 2006
WebsiteSelf-Organization in the Evolution of Speech, Studies in the Evolution of Language
Author: Pierre-Yves Oudeyer
Publisher:Oxford University Press
WebsiteSpeech Recognition Over Digital Channels
Authors: Antonio M. Peinado and Jose C. Segura
Publisher: Wiley, July 2006
WebsiteMultilingual Speech Processing
Editors: Tanja Schultz and Katrin Kirchhoff ,
Elsevier Academic Press, April 2006
WebsiteReconnaissance automatique de la parole: Du signal a l'interpretation
Authors: Jean-Paul Haton
Christophe Cerisara
Dominique Fohr
Yves Laprie
Kamel Smaili
392 Pages
Publisher: Dunod
Back to TopNews from LDC
LDC2008T04
- OntoNotes Release 2.0 -
LDC2008T05
- Penn Discourse Treebank Version 2.0 -
- 2007 Member Survey Responses -
- 2008 Publications Pipeline -
New Publications
(1) The OntoNotes project is a collaborative effort between BBN Technologies, the University of Colorado, the University of Pennsylvania, and the University of Southern California's Information Sciences Institute. The goal of the project is to annotate a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, use net, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).OntoNotes Release 1.0 contains 400k words of Chinese newswire data and 300k words of English newswire data. The current release, OntoNotes Release 2.0, adds the following to the corpus: 274k words of Chinese broadcast news data and 200k words of English broadcast news data. The current goals call for annotation of over a million words each of English and Chinese, and half a million words of Arabic over five years. OntoNotes builds on two time-tested resources, following the Penn Treebank for syntax and the Penn PropBank for predicate-argument structure. Its semantic representation will include word sense disambiguation for nouns and verbs, with each word sense connected to an ontology, and coreference. OntoNotes Release 2.0 is distributed on one DVD-ROM.
2008 Subscription Members will automatically receive two copies of this corpus. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$4500.
*
(2) The Penn Discourse Treebank (PDTB) Project is located at the Institute for Research in Cognitive Science at the University of Pennsylvania. The goal of the project is to develop a large scale corpus annotated with information related to discourse structure. Penn Discourse Treebank Version 2.0 contains annotations of discourse relations and their arguments on the one million word Wall Street Journal (WSJ) data in Treebank-2 (LDC95T7).
The PDTB focuses on encoding discourse relations associated with discourse connectives, adopting a lexically grounded approach for the annotation. The corpus provides annotations for the argument structure of Explicit and Implicit connectives, the senses of connectives and the attribution of connectives and their arguments. The lexically grounded approach exposes a clearly defined level of discourse structure which will support the extraction of a range of inferences associated with discourse connectives.
The PDTB annotates semantic or informational relations holding between two (and only two) Abstract Objects (AOs), expressed either explicitly via lexical items or implicitly via adjacency. For the former, the lexical items anchoring the relation are annotated as Explicit connectives. For the latter, the implicit inferable relations are annotated by inserting an Implicit connective that best expresses the inferred relation.
Explicit connectives are identified from three grammatical classes: subordinating conjunctions (e.g., because, when), coordinating conjunctions (e.g., and, or), and discourse adverbials (e.g., however, otherwise). Arguments of connectives are simply labeled Arg2 for the argument appearing in the clause syntactically bound to the connective, and Arg1 for the other argument. In addition to the argument structure of discourse relations, the PDTB also annotates the attribution of relations (both explicit and implicit) as well as of each of their arguments.
The current release contains 40600 discourse relations annotations, distributed into the following five types: Explicit Relations, Implicit Relations, Alternative Lexicalizations, Entity Relations, and No Relations. Penn Discourse Treebank Version 2.0 is distributed via web download.
2008 Subscription Members will automatically receive two copies of this corpus on disc. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1000.
2007 Member Survey Responses
Please click here to access a summary of the responses to Questions 1-15 of the 2007 Member Survey. These questions were sent to all survey recipients.
We also received many suggestions for future releases, among them:
* More African language publications
* Gigaword corpora in additional languages
* More annotated data for a greater variety of uses
* More parallel text corpora
* Web blogs and chat room data
As you will see elsewhere in this newsletter, several corpora that would satisfy these needs are prospective 2008 publications.
The winner of the blind drawing for the $500 benefit for survey responses received by January 14, 2008 is Richard Rose of McGill University. Congratulations!
To all survey respondents: As promised, a more detailed analysis of the survey will be arriving within the next few weeks. Stay tuned!2008 Publications Pipeline
Membership Year (MY) 2008 is shaping up to be another productive one for the LDC. We anticipate releasing a balanced and exciting selection of publications. Here is a glimpse of what is in the pipeline for MY2008. (Disclaimer: unforeseen circumstances may lead to modifications of our plans. Please regard this list as tentative).
- BLLIP 1994-1997 News Text Release 1 - automatic parses for the North American News Text Corpus - NANT (LDC95T21). The parses were generated by the Charniak and Johnson Reranking Parser which was trained on Wall Street Journal (WSJ) data from Treebank 3 (LDC99T42). Each file is a sequence of n-best lists containing the top n parses of each sentence with the corresponding parser probability and reranker score. The parses may be used in systems that are trained off labeled parse trees but require more data than found in WSJ. Two versions will be released: a complete 'Members-Only' version which contains parses for the entire NANT Corpus and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal.
- Chinese Proposition Bank - the goal of this project is to create a corpus of text annotated with information about basic semantic propositions. Predicate-argument relations are being added to the syntactic trees of the Chinese Treebank Data. This release contains the predicate-argument annotation of 81,009 verb instances (11,171 unique verbs) and 14,525 noun instances (1,421 unique nouns). The annotation of nouns are limited to nominalizations that have a corresponding verb.
- English Dictionary of the Tamil Verb - contains translations for 6597 English verbs and defines 9716 Tamil verbs. Each entry contain the following: the English entry or head word; the Tamil equivalent (in Tamil script and transliteration); the verb class and transitivity specification; the spoken Tamil pronunciation (audio files in mp3 format); the English definition(s); additional Tamil entries (if applicable); example sentences or phrases in Literary Tamil, Spoken Tamil (with a corresponding audio file) and an English translation; and Tamil synonyms or near-synonyms, where appropriate.
- GALE Phase 1 Arabic Blog Parallel Text - contains a total of 102K words (222 files) of Arabic blog text selected from 33 sources. Blogs consist of posts to informal web-based journals of varying topical content. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to LDC's GALE Translation guidelines.
- GALE Phase 1 Chinese Blog Parallel Text - contains a total of 313K characters (277 files) of Chinese blog text selected from 8 sources. Blogs consist of posts to informal web-based journals of varying topical content. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to the LDC's GALE Translation guidelines.
- GALE Phase 1 Arabic Newsgroup Parallel Text - contains a total of 178K words (264 files) of Arabic newsgroup text selected from 35 sources. Newsgroups consist of posts to electronic bulletin boards, Usenet newsgroups, discussion groups and similar forums. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to LDC's GALE Translation guidelines.
- GALE Phase 1 Chinese Newsgroup Parallel Text - contains a total of 240K characters (112 files) of Chinese newsgroup text selected from 25 sources. Newsgroups consist of posts to electronic bulletin boards, Usenet newsgroups, discussion groups and similar forums. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to the LDC's GALE Translation guidelines.
- Hindi WordNet - first wordnet for an Indian language. Similar in design to the Princeton Wordnet for English, it incorporates additional semantic relations to capture the complexities of Hindi. The WordNet contains 28604 synsets and 63436 unique words. Created by the NLP group at Indian Institute of Technology Bombay, it is inspiring construction of wordnets for many other Indian languages, notably Marathi.
- LCTL Bengali Language Pack - a set of linguistic resources to support technological improvement and development of new technology for the Bengali language created in the Less Commonly Taught Languages (LCTL) project which covered a total of _ languages. Package components are: 2.6 million tokens of monolingual text, 500,000 tokens of parallel text, a bilingual lexicon with 48,000 entries, sentence and word segmenting tools, an encoding converter, a part of speech tagger, a morphological analyzer, a named entity tagger and 136,000 tokens of named entity tagged text, a Bengali-to-English name transliterator, and a descriptive grammar created by a PhD research linguist. About 30,000 tokens of the parallel text are English-to-LCTL translations of a "Common Subset" corpus, which will be included in all additional LCTL Language Packs.
- North American News Text Corpus (NANT) Reissue - as a companion to BLLIP 1994-1997 News Text Release 1, LDC will reissue the North American News Text Corpus (LDC95T21). Data includes news text articles from several sources (L.A.Times/Washington Post, Reuters General News, Reuters Financial News, Wall Street Journal, New York Times) that has been formatted with TIPSTER-style SGML tags to indicate article boundaries and organization of information within each article. Two versions will be released: a complete 'Members-Only' version which contains all previously released NANT articles and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal.
As a reminder, MY2007 will remain open for joining through December 31, 2008 and MY2008 through December 31, 2009. Take note that some of our current discounts on Membership Fees will be no longer be effective after March 1, 2008. Please see our Announcements page for complete details.
Ilya Ahtaridis
Membership Coordinator
--------------------------------------------------------------------Linguistic Data Consortium Phone: (215) 573-1275 University of Pennsylvania Fax: (215) 573-2175 3600 Market St., Suite 810 ldc@ldc.upenn.edu Philadelphia, PA 19104 http://www.ldc.upenn.edu/
Back to TopQuestion Answering on Speech Transcripts (QAst)
The QAst organizers are pleased to announce the release of the development dataset for
Back to Top
the CLEF-QA 2008 track "Question Answering on Speech Transcripts" (QAst).
We take this opportunity to launch a first call for participation in
this evaluation exercise.
QAst is a CLEF-QA track that aims at providing an evaluation framework
for QA technology on speech transcripts, both manual and automatic.
A detailed description of this track is available at:
http://www.lsi.upc.edu/~qast <http://www.lsi.upc.edu/~qast>
It is the second evaluation for the QAst track.
Last year (QAst 2007), factual questions had been generated for two
distinct corpora (in English language only). This year, in addition to
factual questions,
some definition questions are generated, and five corpora covering three
different languages are used (3 corpora in English, 1 in Spanish and 1
in French).
Important dates:
# 15 June 2008: evaluation set released
# 30 June 2008: submission deadline
The pilot track is organized jointly by the Technical University of
Catalonia (UPC), the Evaluations and Language resources Distribution
Agency (ELDA) and Laboratoire d'Informatique pour la Mécanique et les
Sciences de l'Ingénieur (LIMSI).
If you are interested in participating please send an email to Jordi
Turmo (turmo_AT_lsi.upc.edu) with "QAst" in the subject line.
Job openings
We invite all laboratories and industrial companies which have job offers to send them to the ISCApad editor: they will appear in the newsletter and on our website for free. (also have a look at http://www.isca-speech.org/jobs.html as well as http://www.elsnet.org/ Jobs)
Back to TopSpeech Engineer/Senior Speech Engineer at Microsoft, Mountain View, CA,USA
Job Type: Full-Time
Send resume to Bruce Buntschuh
Responsibilities:
Tellme, now a subsidiary of Microsoft, is a company that is focused on delivering the highest quality voice recognition based applications while providing the highest possible automation to its clients. Central to this focus is the speech recognition accuracy and performance that is used by the applications. The candidate will be responsible for the development, performance analysis, and optimization of grammars, as well as overall speech recognition accuracy, in a wide variety of real world applications in all major market segments. This is a unique opportunity to apply and extend state of the art speech recognition technologies to emerging spaces such as information search on mobile devices.
Requirements:
· Strong background in engineering, linguistics, mathematics, machine learning, and or computer science.
· In depth knowledge and expertise in the field of speech recognition.
· Strong analytical skills with a determination to fully understand and solve complex problems.
· Excellent spoken and written communication skills.
· Fluency in English (Spanish a plus).
· Programming capability with scripting tools such as Perl.
Education:
MS, PhD, or equivalent technical experience in an area such as engineering, linguistics, mathematics, or computer science.
Back to TopSpeech Technology and Software Development Engineer at Microsoft Redmond WA, USA
Speech Technology and Software Development Engineer
Speech Technologies and Modeling
Speech Component Group
Microsoft Corporation
Redmond WA, USA
Please contact: Yifan.Gong@microsoft.com
Microsoft's Speech Component Group has been working on automatic speech recognition (SR) in real environments. We develop SR products for multiple languages for mobile devices, desktop computers, and communication servers. The group now has an open position for speech scientists with a software development focus to work on our acoustic and language modeling technologies. The position offers great opportunities for innovation and technology and product development.
Responsibilities:
· Design and implement speech/language modeling and recognition algorithms to improve recognition accuracy.
· Create, optimize and deliver quality speech recognition models and other components tailored to our customers' needs.
· Identify, investigate and solve challenging problems in the areas of recognition accuracy from speech recognition system deployments.
· Improve speech recognition language expansion engineering process that ensures product quality and scalability.
Required competencies and skills:
· Passion about speech technology and quality software, demonstrated ability relative to the design and implementation of speech recognition algorithms.
· Strong desire for achieving excellent results, strong problem solving skills, ability to multi-task, handle ambiguities, and identify issues in complex SR systems.
· Good software development skills, including strong aptitude for software design and coding. 3+ years of experience in C/C++ and programming with scripting languages are highly desirable.
· MS or PhD degree in Computer Science, Electrical Engineering, Mathematics, or related disciplines, with strong background in speech recognition technology, statistical modeling, or signal processing.
· Track record of developing SR algorithms, or experience in linguistic/phonetics, is a plus.
Back to TopPhD Research Studentship in Spoken Dialogue Systems- Cambridge UK
Applications are invited for an EPSRC sponsored studentship in Spoken Dialogue Systems leading to the PhD degree. The student will join a team lead by Professor Steve Young working on statistical approaches to building Spoken Dialogue Systems. The overall goal of the team is to develop complete working end-to-end systems which can be trained from real data and which can be continually adapted on-line. The PhD work will focus specifically on the use of Partially Observable Markov Decision Processes for dialogue modelling and techniques for learning and adaptation within that framework. The work will involve statistical modelling, algorithm design and user evaluation. The successful candidate will have a good first degree in a relevant area. Good programming skills in C/C++ are essential and familiarity with Matlab would be useful.
Back to Top
The studentship will be for 3 years starting in October 2007 or January 2008. The studentship covers University and College fees at the Home/EU rate and a maintenance allowance of 13000 pounds per annum. Potential applicants should email Steve Young with a brief CV and statement of interest in the proposed work areaAT&T - Labs Research: Research Staff Positions - Florham Park, NJ
AT&T - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are
- Large-vocabulary automatic speech recognition
- Acoustic and language modeling
- Robust speech recognition
- Signal processing
- Speaker recognition
- Speech data mining
- Natural language understanding and dialog
- Text and web mining
- Voice and multimodal search
AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).
Back to TopResearch Position in Speech Processing at UGent, Belgium
Background
Since March 2005, the universities of Leuven, Gent, Antwerp and Brussels have joined forces in a big research project, called SPACE (SPeech Algorithms for Clinical and Educational applications). The project aims at contributing to the broader application of speech technology in educational and therapeutic software tools. More specifically, it pursues the automatic detection and classification of reading errors in the context of an automatic reading tutor, and the objective assessment of disordered speech (e.g. speech of the deaf, dysarthric speech, ...) in the context of computer assisted speech therapy assessment. Specific for the target applications is that the speech is either grammatically and lexically incorrect or a-typically pronounced. Therefore, standard technology cannot be applied as such in these applications.
Job description
The person we are looking for will be in charge of the data-driven development of word mispronunciation models that can predict expected reading errors in the context of a reading tutor. These models must be integrated in the linguistic model of the prompted utterance, and achieve that the speech recognizer becomes more specific in its detection and classification of presumed errors than a recognizer which is using a more traditional linguistic model with context-independent garbage and deletion arcs. A challenge is also to make the mispronunciation model adaptive to the progress made by the user.
Profile
We are looking for a person from the EU with a creative mind, and with an interest in speech & language processing and machine learning. The work will require an ability to program algorithms in C and Python. Having experience with Python is not a prerequisite (someone with some software experience is expected to learn this in a short time span). Demonstrated experience with speech & language processing and/or machine learning techniques will give you an advantage over other candidates.
The job is open to a pre-doctoral as well as a post-doctoral researcher who can start in November or December. The job runs until February 28, 2009, but a pre-doctoral candidate aiming for a doctoral degree will get opportunities to do follow-up research in related projects.
Interested persons should send their CV to Jean-Pierre Martens (martens@elis.ugent.be). There is no real deadline, but as soon as a suitable person is found, he/she will get the job.
Back to TopSummer Inter positions at Motorola Schaumburg Illinois USA
Motorola Labs - Center for Human Interaction Research (CHIR) located in Schaumburg Illinois, USA, is offering summer intern positions in 2008 (12 weeks each).
CHIR's mission:
Our research lab develops technologies that provide access to rich communication, media and information services effortless, based on natural, intelligent interaction. Our research aims on systems that adapt automatically and proactively to changing environments, device capabilities and to continually evolving knowledge about the user.
Intern profiles:
1) Acoustic environment/event detection and classification.
Successful candidate will be a PhD student near the end of his/her PhD study and is skilled in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming. Candidates with knowledge of acoustic environment/event classification are preferred.
2) Speaker adaptation for applications on speech recognition and spoken document retrieval.
The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete understanding and hand-on experience on automatic speech recognition related research. Proficiency in Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speaker adaptation is highly preferred.
3) Development of voice search-based web applications on a smartphone
We are looking for an intern candidate to help create an "experience" prototype based on our voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and rich interaction with web resources. This intern project is oriented more towards software engineering than research. We target an intern with a master's degree and strong software engineering background. Mastery of C++ and experience with web programming (AJAX and web services) is required. Development experience on Windows CE/Mobile desired.
4) Integrated Voice Search Technology For Mobile Devices.
Candidate should be proficient in information retrieval, pattern recognition and speech recognition. Candidate should program in C++ and script languages such as Python or Perl in Linux environment. Also, he/she should have knowledge on information retrieval or search engines.
We offer competitive compensation, fun-to-work environment and Chicago-style pizza.
If you are interested, please send your resume to:
Dusan Macho, CHIR-Motorola Labs
Email: dusan.macho@motorola.com
Tel: +1-847-576-6762
Back to TopNuance: Software engineer speech dialog tools
In order to strengthen our Embedded ASR Research team, we are looking for a:
SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS
As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.
OVERVIEW:
- You will work in Nuance's Embedded ASR R&D team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.
- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.
- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.
RESPONSIBILITIES:
- You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.
- You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.
- In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.
QUALIFICATIONS:
- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.
- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.
- You have experience with Python or other scripting languages.
- GUI programming experience is a strong asset.
The following skills are a plus:
- Understanding of communication protocols
- Understanding of databases
- Understanding of computational agents and related frameworks (such as OAA).
- A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics and machine learning, especially as related to natural language processing, dialogue, and representation of information
- You can work both as a team player and as goal-oriented independent software engineer.
- You can work in a multi-national team and communicate effectively with people of different cultures.
- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.
- You are fluent in English and you can write high quality documentation.
- Knowledge of other languages is a plus.
CONTACT:
Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to
Deanna Roe Deanna.roe@nuance.com
Please make sure to document to us your excellent software engineering skills.
ABOUT US:
Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.
Back to TopNuance: Speech scientist London UK
Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.
To strengthen our International Professional Services team, based in London, we are currently looking for a
Speech Scientist, London, UK
Nuance Professional Services (PS) has designed, developed, and optimized thousands of speech systems across dozens of industries, including directory search, call center automation, applications in telecom, finance, airline, healthcare, and other verticals; applications for video games, mobile dictation, enhanced search services, SMS, and in-car navigation. Nuance PS applications have automated approximately 7 billion phone conversations for some of the world's most respected companies, including British Airways, Vodafone, Amtrak, Bank of America, BellCanada, Citigroup, General Electric, NTT and Verizon.
The PS organization consists of energetic, motivated, and friendly individuals. The Speech Scientists in PS are among the best and brightest, with PhDs from universities such as Cambridge (UK), MIT, McGill, Harvard, Penn, CMU, and Georgia Tech, and having worked at research labs such Bell Labs, Motorola Labs, and ATR (Japan), culminating in over 300 years of Speech Science experience and covering well over 20 languages.
Come and join Nuance PS and work on the latest technology from one of the prominent speech recognition technology providers, and make a difference in the way the world communicates.
Job Overview
As a Speech Scientist in the Professional Services group, you will work on automated speech recognition applications, covering a broad range of activities in all project phases, including the design, development, and optimization of the system. You will:
- Work across application development teams to ensure best possible recognition performance in deployed systems
- Identify recognition challenges and assess accuracy feasibility during the design phase,
- Design, develop, and test VoiceXML grammars and create JSPs, Java, and ECMAscript grammars for dynamic contexts
- Optimize accuracy of applications by analyzing performance and tuning statistical language models, pronunciations, and acoustic models, including identifying areas for improvement by running the recognizer offline
- Contribute to the generation and presentation of client-facing reports
- Act as technical lead on more intensive client projects
- Develop methodologies, scripts, procedures that improve efficiency and quality
- Develop tools and enhance algorithms that facilitate deployment and tuning of recognition components
- Act as subject matter domain expert for specific knowledge domains
- Provide input into the design of future product releases
Required Skills
- MS or PhD in Computer Science, Engineering, Computational Linguistics, Physics, Mathematics, or related field (or equivalent)
- Strong analytical and problem solving skills and ability to troubleshoot issues
- Good judgment and quick-thinking
- Strong programming skills, preferably Perl or Python
- Excellent written and verbal communications skills
- Ability to scope work taking technical, business and time-frame constraints into consideration
- Works well in a team and in a fast-paced environment
Beneficial Skills
- Strong programming skills in either Perl, Python, Java, C/C++, or Matlab
- Speech recognition knowledge
- Strong pattern recognition, linguistics, signal processing, or acoustics knowledge
- Statistical data analysis
- Experience with XML, VoiceXML, and Wiki
- Ability to mentor or supervise others
- Additional language skills, eg French, Dutch, German, Spanish
Back to TopNuance: Research engineer speech engine
n order to strengthen our Embedded ASR Research team, we are looking for a:
RESEARCH ENGINEER SPEECH ENGINE
As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.
OVERVIEW:
- You will work in Nuance's Embedded ASR R&D team, developing, improving and maintaining core ASR engine algorithms for our customers in the Automotive and Mobile sector.
- You will work either at Nuance's Office in Aachen, a beautiful, old city right in the heart of Europe with great history and culture, or at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the vibrant and picturesque city of Ghent, in the Flanders region of Belgium. Both Aachen and Ghent offer some of the most spectacular historic town centers in Europe, and are home to large international universities.
- You will work in an international company and cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.
RESPONSIBILITIES:
- You will work on the developing, improving and maintaining core ASR engine algorithms for cutting edge speech and natural language understanding technologies for automotive and mobile devices.
- You will work on the design and development of more efficient, flexible ASR search algorithms with high focus on low memory and processor requirements.
QUALIFICATIONS:
- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field. PhD is a plus.
- A background in (computational) linguistics, speech processing, ASR search, confidence values, grammars, statistics and machine learning, especially as related to natural language processing.
- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.
The following skills are a plus:
- You have experience with Python or other scripting languages.
- Broad knowledge about architectures of embedded platforms and processors.
- Understanding of databases
- You can work both as a team player and as goal-oriented independent software engineer.
- You can work in a multi-national team and communicate effectively with people of different cultures.
- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.
- You are fluent in English and you can write high quality documentation.
- Knowledge of other languages is a plus.
CONTACT:
Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to
Deanna Roe Deanna.roe@nuance.com
Please make sure to document to us your excellent software engineering skills.
ABOUT US:
Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.
Back to TopNuance RESEARCH ENGINEER SPEECH DIALOG SYSTEMS:
In order to strengthen our Embedded ASR Research team, we are looking for a:
RESEARCH ENGINEER SPEECH DIALOGUE SYSTEMS
As part of our team, you will be creating speech technologies for embedded applications varying from simple command and control tasks up to natural language speech dialogues on mobile and automotive platforms.
OVERVIEW:
-You will work in Nuance's Embedded ASR research and production team, creating technology, tools and runtime software to enable our customers develop embedded speech applications. In our team of speech and language experts, you will work on natural language dialogue systems that define the state of the art.
- You will work at Nuance's International Headquarters in Merelbeke, a small town just 5km away from the heart of the picturesque city of Ghent, in the Flanders region of Belgium. Ghent has one of the most spectacular historic town centers of Europe and is known for its unique vibrant yet cozy charm, and is home to a large international university.
- You will work in an international company and cooperate with people on various locations including in Europe, America, and Asia. You may occasionally be asked to travel.
RESPONSIBILITIES:
- You will work on the development of cutting edge natural language dialogue and speech recognition technologies for automotive embedded systems and mobile devices.
- You will design, implement, evaluate, optimize, and test new algorithms and tools for our speech recognition systems, both for research prototypes and deployed products, including all aspects of dialogue systems design, such as architecture, natural language understanding, dialogue modeling, statistical framework, and so forth.
- You will help the engine process multi-lingual natural and spontaneous speech in various noise conditions, given the challenging memory and processing power constraints of the embedded world.
QUALIFICATIONS:
- You have a university degree in computer science, (computational) linguistics, engineering, mathematics, physics, or a related field. A graduate degree is an asset.
-You have strong software and programming skills, especially in C/C++, ideally for embedded applications. Knowledge of Python or other scripting languages is a plus. [HQ1]
- You have experience in one or more of the following fields:
dialogue systems
applied (computational) linguistics
natural language understanding
language generation
search engines
speech recognition
grammars and parsing techniques.
statistics and machine learning techniques
XML processing
-You are a team player, willing to take initiative and assume responsibility for your tasks, and are goal-oriented.
-You can work in a multi-national team and communicate effectively with people of different cultures.
-You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.
-You are fluent in English and you can write high quality documentation.
-Knowledge of other languages is a strong asset.
CONTACT:
Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to
Deanna Roe Deanna.roe@nuance.com
ABOUT US:
Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world. Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. With more than 3000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.
Back to TopResearch Position in Speech Processing at Nagoya Institute of
Research Position in Speech Processing at Nagoya Institute of
Technology, Japan
Nagoya Institute of Technology is seeking a researcher for a
post-doctoral position in a new European Commission-funded project
EMIME ("Efficient multilingual interaction in mobile environment")
involving Nagoya Institute of Technology and other five European
partners, starting in March 2008 (see the project summary below).
The earliest starting date of the position is March 2007. The initial
duration of the contract will be one year, with a possibility for
prolongation (year-by-year basis, maximum of three years). The
position provides opportunities to collaborate with other researchers
in a variety of national and international projects. The competitive
salary is calculated according to qualifications based on NIT scales.
The candidate should have a strong background in speech signal
processing and some experience with speech synthesis and recognition.
Desired skills include familiarity with latest spectrum of technology
including HTK, HTS, and Festival at the source code level.
For more information, please contact Keiichi Tokuda
(http://www.sp.nitech.ac.jp/~tokuda/).
About us
Nagoya Institute of Technology (NIT), founded on 1905, is situated in
the world-quality manufacturing area of Central Japan (about one hour
and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).
NIT is a highest-level educational institution of technology and is
one of the leaders of such institutions in Japan. EMIME will be
carried at the Speech Processing Laboratory (SPL) in the Department of
Computer Science and Engineering of NIT. SPL is known for its
outstanding, continuous contribution of developing high-performance,
high-quality opensource software: the HMM-based Speech Synthesis
System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary
continuous speech recognition engine "Julius"
(http://julius.sourceforge.jp/), and the Speech Signal Processing
Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is
involved in numerous national and international collaborative
projects. SPL also has close partnerships with many industrial
companies, in order to transfer its research into commercial
applications, including Toyota, Nissan, Panasonic, Brother Inc.,
Funai, Asahi-Kasei, ATR.
Project summary of EMIME
The EMIME project will help to overcome the language barrier by
developing a mobile device that performs personalized speech-to-speech
translation, such that a user's spoken input in one language is used
to produce spoken output in another language, while continuing to
sound like the user's voice. Personalization of systems for
cross-lingual spoken communication is an important, but little
explored, topic. It is essential for providing more natural
interaction and making the computing device a less obtrusive element
when assisting human-human interactions.
We will build on recent developments in speech synthesis using hidden
Markov models, which is the same technology used for automatic speech
recognition. Using a common statistical modeling framework for
automatic speech recognition and speech synthesis will enable the use
of common techniques for adaptation and multilinguality.
Significant progress will be made towards a unified approach for
speech recognition and speech synthesis: this is a very powerful
concept, and will open up many new areas of research. In this
project, we will explore the use of speaker adaptation across
languages so that, by performing automatic speech recognition, we can
learn the characteristics of an individual speaker, and then use those
characteristics when producing output speech in another language.
Our objectives are to:
1. Personalize speech processing systems by learning individual
characteristics of a user's speech and reproducing them in
synthesized speech.
2. Introduce a cross-lingual capability such that personal
characteristics can be reproduced in a second language not spoken
by the user.
3. Develop and better understand the mathematical and theoretical
relationship between speech recognition and synthesis.
4. Eliminate the need for human intervention in the process of
cross-lingual personalization.
5. Evaluate our research against state-of-the art techniques and in a
practical mobile application.
Back to TopC/C++ Programmer Munich, Germany
Digital publishing AG is one of Europe's leading producers of interactive software for foreign language training. In our e- learning courses we want to place the emphasis on speaking and spoken language understanding. In order to strengthen our Research & Development Team in Munich, Germany, we are looking for experienced C or C++ programmers with at least 3 years experience in the design and coding of sophisticated software systems under Windows.
We offer
-a creative working atmosphere in an international team of software engineers, linguists and editors working on challenging research projects in speech recognition and speech dialogue systems
- participation in all phases of a product life cycle, as we are interested in the fast transfer of research results into products.
- the possibility to participate in international scientific conferences.
- a permanent job in the center of Munich.
- excellent possibilities for development within our fast growing company.
- flexible working times, competitive compensation and arguably the best espresso in Munich.
We expect
-several years of practical experience in software development in C or C++ in a commercial or academic environment.
-experience with parallel algorithms and thread programming.
-experience with object-oriented design of software systems.
-good knowledge of English or German.
Desirable is
-experience with optimization of algorithms.
-experience in statistical speech or language processing, preferably speech recognition, speech synthesis, speech dialogue systems or chatbots.
-experience with Delphi or Turbo Pascal.
Interested? We look forward to your application: (preferably by e-mail)
digital publishing AG
Freddy Ertl f.ertl@digitalpublishing.de
Tumblinger Straße 32
D-80337 München Germany
Back to TopSpeech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA
Speech and Natural Language Processing Engineer
M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.
We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.Responsibilities:
- You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
- You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
- You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.
Requirements:
- Solid background in speech recognition, natural language processing, machine learning and information extraction.
- 2+ years of experience participating in software development projects
- Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
- Excellent analytical and problem-solving skills
- Integrate and communicate well in small R&D teams
- Masters degree in CS or related engineering fields
- Experience in a healthcare-related field a plus
In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh. We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.
M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund. If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com.
Back to TopSenior Research Scientist -- Speech and Natural Lgage Processing at M*Modal, Pittsburgh, PA,USA
Senior Research Scientist -- Speech and Natural Language Processing
M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.
We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.Responsibilities:
- Plan and perform research and development tasks to continuously improve a state-of-the-art speech understanding system
- Take a leading role in identifying solutions to challenging technical problems
- Contribute original ideas and turn them into product-grade software implementations
- Collaborate with other members of the R&D team to identify, analyze and resolve technical issues
Requirements:
- Solid research & development background with 3+ years of experience in speech recognition research, covering at least two of the following topics: speech processing, acoustic modeling, language modeling, decoding, LVCSR, natural language processing/understanding, speaker verification/identification, audio mining
- Working knowledge of Machine Learning, Information Extraction and Natural Language Processing algorithms
- 3+ years of experience participating in large-scale software development projects using C++ and Java.
- Excellent analytical, problem-solving and communication skills
- PhD with focus on speech recognition or Masters degree with 3+ years industry experience working on automatic speech recognition
- Experience and/or education in medical informatics a plus
- Working experience in a healthcare related field a plus
In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh. We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.
M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund. If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com.
Back to TopPostdoc position at LORIA, Nancy, France
Building an articulatory model from ultrasound, EMA and MRI data
Postdoctoral position
Research project
An articulatory model comprises both the visible and the internal mobile articulators which are involved in speech articulation: the lower jaw, tongue, lips and velum) as well as the fixed walls (the palate, the rear wall of the pharynx). An articulatory model is dynamic since the articulators deform during speech production. Such a model has a potential interest in the field of language learning by providing visual feedback on the articulation conducted by the learner, and many other applications.
Building an articulatory model is difficult because the different articulators have to be detected from specific image modalities: the lips are acquired through video, the tongue shape is acquired through ultrasound imaging with a high frame rate but these 2D images are very noisy. Finally, 3D images of all articulators can be obtained with MRI but only for sustained sounds (as vowels) due to the long acquisition time of MRI images.
The subject of this post-doc is to construct a dynamic 3D model of the entire vocal tract by merging the 3D information available in the MRI acquisitions and temporal 2D information provided by the contours of the tongue visible on the ultrasound images or X-ray images.
We are working on the construction of an articulatory model within the European project ASPI (http://aspi.loria.fr/ ).
We already built an acquisition system which allows us to obtain synchronized data from ultrasound, MRI, video and EM modalities.
Only a few complete articulatory models are currently available in the world and a real challenge in the field is to design set-ups and easy-to-use methods for automatically building the model of any speaker from 3D and 2D images. Indeed, the existence of more articulatory models would open new directions of research about speaker variability and speech production.
Objectives
The aim of the subject is to build a deformable model of the vocal tract from static 3D MRI images and 2D dynamic 2D sequences. Previous works have been conducted on the modelling of the vocal tract, and especially of the tongue (M. Stone[1] O. Engwall[2]). Unfortunately, important human interaction is required to extract tongue contours in the images. In addition, only one image modality is often considered in these works, thus reducing the reliability of the model obtained.
The aim of this work is to provide automatic methods for segmenting features in the images as well as methods for building a parametric model of the 3D vocal tract with these specific aims:
- The segmentation process is to be guided by prior knowledge on the vocal tract. In particular shape, topologic as well as regularity constraints must be considered.
- A parametric model of the vocal tract has to be defined (classical models are linear and built from a principal component analysis). Special emphasis must be put on the problem of matching the various features between the images.
- Besides classical geometric constraints, both the building and the assessment of the model will be guided by acoustic distances in order to check for the adequation between the sound synthesized from the model and the sound realized by the human speaker.
Skill and profile
The recruited person must have a solid background in computer vision and in applied mathematics. Informations and demonstrations on the research topics addressed by the Magrit team are available at http://magrit.loria.fr/
References
[1] M. Stone : Modeling tongue surface contours from Cine-MRI images. Journal of Speech, language, hearing research, 2001.
[2]:P. Badin, G. Bailly, L. Reveret: Three-dimensional linear articulatory modeling of tongue, lips and face based on MRI and video images, Journal of Phonetics, 2002, vol 30, p 533-553
Contact
Interested candidates are invited to contact Marie-Odile Berger, berger@loria.fr, +33 3 54 95 85 01
Important information
This position is advertised in the framework of the national INRIA campaign for recruiting post-docs. It is a one year position, renewable, beginning fall 2008. The salary is 2,320€ gross per month.
Selection of candidates will be a two step process. A first selection for a candidate will be carried out internally by the Magrit group. The selected candidate application will then be further processed for approval and funding by an INRIA committee.
Doctoral thesis less than one year old (May 2007) or being defended before end of 2008. If defence has not taken place yet, candidates must specify the tentative date and jury for the defence.
Important - Useful links
Presentation of INRIA postdoctoral positions
To apply (be patient, loading this link takes times...)
Back to TopInternships at Motorola Labs Schaumburg.
Motorola Labs - Center for Human Interaction Research (CHIR)
located in Schaumburg Illinois, USA,
is offering summer intern positions in 2008 (12 weeks each).CHIR's missionOur research lab develops technologies that provide access to rich communication, media and
information services effortless, based on natural, intelligent interaction. Our researchaims on systems that adapt automatically and proactively to changing environments, devicecapabilities and to continually evolving knowledge about the user.Intern profiles
1) Acoustic environment/event detection and classification.
Successful candidate will be a PhD student near the end of his/her PhD study and is skilled
in signal processing and/or pattern recognition; he/she knows Linux and C/C++ programming.Candidates with knowledge of acoustic environment/event classification are preferred.
2) Speaker adaptation for applications on speech recognition and spoken document retrieval
The successful candidate must currently be pursuing a Ph.D. degree in EE or CS with complete
understanding and hand-on experience on automatic speech recognition related research. Proficiencyin Linux/Unix working environment and C/C++ programming. Strong GPA. A strong background in speakeradaptation is highly preferred.3) Development of voice search-based web applications on a smartphoneWe are looking for an intern candidate to help create an "experience" prototype based on our
voice search technology. The app will be deployed on a smartphone and demonstrate intuitive and
rich interaction with web resources. This intern project is oriented more towards software engineeringthan research. We target an intern with a master's degree and strong software engineering background.Mastery of C++ and experience with web programming (AJAX and web services) is required.
Development experience on Windows CE/Mobile desired.
4) Integrated Voice Search Technology For Mobile DevicesCandidate should be proficient in information retrieval, pattern recognition and speech recognition.
Candidate should program in C++ and script languages such as Python or Perl in Linux environment.
Also, he/she should have knowledge on information retrieval or search engines.
We offer competitive compensation, fun-to-work environment and Chicago-style pizza.
If you are interested, please send your resume to:
Dusan Macho, CHIR-Motorola Labs
Email: dusan [dot] macho [at] motorola [dot] com
Tel: +1-847-576-6762
Back to TopMasters in Human Language Technology
*** Studentships available for 2008/9 ***
One-Year Masters Course in HUMAN LANGUAGE TECHNOLOGY
Department of Computer Science
The University of Sheffield - UK
The Sheffield MSc in Human Language Technology (HLT) has been carefully tailored
to meet the demand for graduates with the highly-specialised multi-disciplinary skills
that are required in HLT, both as practitioners in the development of HLT applications
and as researchers into the advanced capabilities required for next-generation HLT
systems. The course provides a balanced programme of instruction across a range
of relevant disciplines including speech technology, natural language processing and
dialogue systems. The programme is taught in a research-led environment.
This means that you will study the most advanced theories and techniques in the field,
and have the opportunity to use state-of-the-art software tools. You will also have
opportunities to engage in research-level activity through in-depth exploration of
chosen topics and through your dissertation. As well as readying yourself for
employment in the HLT industry, this course is also an excellent introduction to the
substantial research opportunities for doctoral-level study in HLT.
*** A number of studentships are available, on a competitive basis, to suitably
qualified applicants. These awards pay a stipend in addition to the course fees.
*** For further details of the course,
see ... http://www.shef.ac.