Contents

1 . Editorial

Dear Members,

For many of us, it is a time for holidays but not for all. Dennis Burham is currently with his team in the last run to Interspeech 2008 in Brisbane. Have a look below and on the conference website: the registration is open and important information concerning the visas should not be missed by potential participants.

Our student members may apply for ISCA grants for this conference as well as for all other ISCA supported conferences.  On the student website, they will find a new online application form.

We are still improving ISCApad: an important issue is to keep everything updated. For that reason, 

1.The title of each section announcing a conference,workshop...will be dated(YYYY-MM-DD) for  the first  day of the conference.

2.The title of each job announcement will show the date the offer was sent to ISCApad's editor : in a transitive period for current offers, no date will be shown but for new ones and from now on, the actual date will. Each time, the company will ask for keeping the advertising posted, the date will be updated.

If you want to use ISCApad to communicate professional information, please send me your text in the last week on the current month for publication in the next month issue: everything sent after the first of a month will be postponed to the next month.

ISCA runs for you, run for ISCA!

Prof. em. Chris Wellekens
Institut Eurecom
France 

 

 
Back to Top

2 . ISCA News

 

Back to Top

2-1 . INTERSPEECH 2011 in Florence

The ISCA Board is very pleased to announce that Interspeech 2011 will take place in Florence, Italy, August 27-31, 2011.

The Italian bid was chosen among three excellent bids, showing the interest of the speech science and technology community in Interspeech events.

The organizing committe includes

Piero Cosi (General Chair),

Renato di Mori (General Co-Chair),

Claudia Manfredi (Local Chair),

Roberto Pieraccini (Technical Program Chair),

Maurizio Omologo (Tutorials),

Giuseppe Riccardi (Plenary Sessions),

among many other well know Italian researchers.

The conference will take place in Palazzo dei Congressi. For more information about this

wonderful city, click on www.interspeech2011.org.

Back to Top

2-2 . Help ISCA serve you better

The ISCA board is always interested in improving its activities and the membership services it provides. To help us with this, could you please send us your ideas/comments/suggestions/impressions? We would be most grateful if you could take a moment to complete the form on the ISCA website : http://www.isca-speech.org/index.php and send us your feedback.

Your message will be sent to the ISCA secretariat : secretariat@isca-speech.org

Please enter ideas/comments/suggestions/impressions you may have on any new (or old) activities and membership services.

Please note: you can send us your comments anonymously, if you so wish.

Eva Hajicova - Membership Services 

 Emmanuelle Foxonet - ISCA Secretariat  for the ISCA board 

Back to Top

2-3 . News from ISCA-SAC (Student committee)

Thanks  to Ebru's efforts the online grant application system has been modified accordingly to Alan Black's suggestions and is now reachable for all isca-students.org registered users from the main menu and through the page:

http://www.isca-students.org/grants
 
You might want to link it from the relevant pages on IS08 and ISCA's websites.

Use it when registering for Interspeech 2008 (Brisbane)!

We remain at your disposal for any further need.

Marco A. Piccolino-Boniforti
ISCA-SAC web coordinator

map55@cam.ac.uk

Back to Top

3 . SIG's activities

3-1 . A report of AISV-SIG Italian Speech Communication Association

AISV-SIG

Italian Speech Communication Association

Regional Activities

Annual Report 2005-2008

 

 

Conference, Workshops and Summer Schools

organized or sponsored by AISV-SIG during the 2005-2008 period

 

 

 

The purpose AISV Italian Speech Communication Regional ISCA Special Interest Group is that of promoting, in the scientific, technical, normative, industrial, social, professional and didactic fields, the study of Speech Sciences in Italy. In particular, the Association is turned to all the subjects differently involved in the study of Phonetics (Acustic/Articulatory), of the Speech Signal Processing and of the Automatic Treatment of the Language (Trattamento Automatico del Linguaggio - TAL) in which are grouped all those disciplines that take care of the voice man-machine interaction and of the human language understanding. The purpose of AISV-SIG is also that of maintaining a direct connection with ISCA with the aim of more effectively valuing the studies and researches on these fields by the Italian scientific community in the European and International framework.

 

AISV-SIG pursues its own purposes in particular:

  • promoting and favouring the studies and researches on Speech Sciences in all their different fields;
  • facing the problems relating to the definition, the instruction and the spreading of Speech Sciences and to their institutional positioning;
  • promoting and favouring the spreading and the deepening of the know ledges acquainted in Speech Sciences through the cure of publications, the organization of conferences, the organization of course, schools and masters, the attribution of scholarships, ecc.;
  • promoting and favouring scientific and technical exchanges of information and relationships of collaboration between the associates;
  • collaborating with Italian and international Organizations responsible of financing the scientific search in this field;
  • favouring and promoting relationships with other Italian and international Associations or Organizations whose purposes are coherent with the own ones for the realization of common aims.

 


ACTIVITIES

 

 

 

2005

 

 

 

 

AISV 2005 Summer School:

1st AISV Summer School

"Modelli linguistici e tecnologici per l'analisi e la gestione di corpora vocali"

“Linguistic and Technological Models for the Analysis and Management of Speech Corpora”

Castello Orsini,  Soriano nel Cimino

10-14 October 2005

http://www2.pd.istc.cnr.it/AISVScuolaEstiva2005/

 

The 1st AISV Summer School was indeed a big success...... we had more than 20 new and very interested young attendees which learned the basis of Linguistic and Technological Methodologies for the Analysis and Management of Speech Corpora. Various speech analysis software were presented and their functionality illustrated and, among them, the PRAAT software was introduced in various practical exercises.

 

 

AISV 2005 - Annual Conference:

2nd AISV National Conference
"ANALISI PROSODICA" - teorie, modelli e sistemi di annotazione

Prosodic Analysis - Theory, Models and Annotation Techniques”

Campus di Fisciano - "Aula delle Lauree"

Salerno, 30 November - 2 December 2005
http://www2.pd.istc.cnr.it/AISV2005/

 

The AISV 2005 2nd Conference was organized in Salerno and there were more than 80 attendees. During the Conference a very interesting RoundTable on "Modelli di Analisi e Sistemi di etichettatura prosodica a confronto" (Prosodic Analysis and Annotation) was organized with G. Marotta chairman supervision and D. Hirst, M. D'Imperio, M. Contini, Ph. Martin were invited. This was a very promising starting point for confronting Italian and French Regional SIG Group activities!


2006

 

 

 

 

AISV 2006 Summer School:

2nd AISV Summer School

"Dalla fonetica alla fonologia sperimentale: teoria e metodi di analisi"

“From Phonetics to Experimental Phonology: Theory and Analysis Methods”

Soriano nel Cimino

25-29 September 2006

http://www2.pd.istc.cnr.it/AISVScuolaEstiva2006/

 

The 2nd AISV Summer School was again a big success with more than 30 very interested attendees. The School was focused on an advanced introduction on experimental phonetics (acoustic and articulatory) and on “Articulatory Phonology” and, more specifically, it was organized with different lessons on:

Introduction to Acoustic Phonetics

·    analysis hardware/software devices and measurement techniques

·    open source programs for speech analysis (Praat, WaveSurfer, Snack);

·    use of acoustic analysis for phonetic descriptions (i.e. vowels, vocalic systems,…);

·    acoustic analysis and pathologies;

·    acoustic analysis of phonetic development;

Introduction to Articulatory Phonetics

·    analysis hardware/software devices and measurement techniques

·    elettropalatografy (EPG), elettroglottografy, EMMA, ELITE, ultrasound-based devices, fMRI.

Introduction to Articulatory Phonology and to “Task Dynamics”

 

 

AISV 2006 - Annual Conference:

3rd AISV National Conference

"Scienze Vocali e del Linguaggio - Metodologie di Valutazione e Risorse Linguistiche"

“Speech and Language Sciences – Evaluation Metodologies and Linguistic Resources”

ITC-irst (Auditorium) - Pantè di Povo TRENTO

29-30 November - 1 December 2006

http://aisv2006.itc.it/

 

The AISV 2006 3rd Conference was organized in Trento at ITC-irst (now FBK) with more than 100 attendees. During the Conference a very interesting Round Table on “Speech and Language Sciences – Evaluation Metodologies and Linguistic Resources” was organized with Bernardo Magnini, (ITC-irst, Trento) chairman supervision and Francesco Cutugno (Università "Federico II", Napoli), John Garofolo (NIST - National Institute of Standards and Technology, USA) and Carol Peters (Istituto di Scienza e Tecnologie dell'Informazione, ISTI-CNR, Pisa)  were invited.

 

 

 


2007

 

 

 

 

AISV 2007 Summer School:

3rd AISV Summer School/Professional Course

"La voce in ambito Forense - trascrizione, comparazione, manomissione, speaker profile ecc.”

“Forensic Phonetics and Speech – Speaker Profile, Speaker Identification, Transcriptions,…

 Soriano nel Cimino

17-21 September 2007

http://www2.pd.istc.cnr.it/AISVScuolaEstiva2007/

 

The 3rd AISV Summer School could be considered also as a professional course on Forensic Phonetics. There were 27 very interested attendees who improve their knowledge on linguistics, psycholinguistics, phonetics, acoustics, statistics and probability theory applied to forensic phonetics. In particular the content of the school was regarding:

·          Speaker identification

·          Speaker profile

·          Utterances analysis

·          Voice line-up

·          Tape authentication

·          Transcription of official recording

·          Speech enhancement

·          Audio signal (identification)

·          Voice changing/morphing

·          Audio signal analysis

 

 

AISV 2007 - Annual Conference:

4th AISV National Conference

"La Fonetica Sperimentale: Metodi e Applicazioni”

“Experimental Phonetics: Methods and Applications”

Campus UniCal - Arcavacata di Rende (CS)

3-5 December 2007

http://www.linguistica.unical.it/aisv2007/

 

The AISV 2006 4th Conference was organized  in Arcavacata di Rende at the Unical Campus.The main theme of the Conference was that of promoting Experimental Phonetics underlining especially the interdisciplinary nature of ifs methodological approach. Various Italian experts in the field gathered together confronting their ideas and results. A very interesting Round Table on “Experimental Phonetics: Methods and Applications” was organized with Emanuela Magno Caldognetto, (ISTC CNR, Padova) chairman supervision and Sergio Canazza Targon (Udine University), Carlos Delgado Romero (Voice Lab – Madrid Police), Nicola Lombardo (ORL Clinic, Magna Grecia University - Catanzaro), Pierluigi Salza (Loquendo S.p.A. - Torino) and Valentina Valentini (Rome University - La Sapienza) were invited.


2008 (what's up)

 

 

 

 

AISV 2008 Summer School:

4th AISV Summer School

"Archivi di corpora vocali - conservazione, catalogazione, restauro audio e fruizione dei documenti sonori”

“Speech Corpora – maintenance, preservation and restoration of audio data”

Soriano nel Cimino

08-12 September 2008

http://www2.pd.istc.cnr.it/AISVScuolaEstiva2008/

 

 

AISV 2008-2009 - Annual Conference:

5th AISV National Conference

"La dimensione temporale del parlato"

"The Temporal dimension of Speech"

January-February 2009

Zurich University

http://www2.pd.istc.cnr.it/AISV2008/

Back to Top

4 . Future ISCA Conferences and Workshops(ITRW)

4-1 . (2008-09-22) INTERSPEECH 2008 Brisbane Australia IMPORTANT UPDATE

 

INTERSPEECH 2008 incorporating SST 08

September 22-26, 2008

Brisbane Convention & Exhibition Centre

Brisbane, Australia

http://www.interspeech2008.org/

An invitation to INTERSPEECH 2008 and Australia – Registration now open 

You can now register for INTERSPEECH 2008.  Simply go  www.interspeech2008.org  and select “Registration” on the side bar, then “Register Here” and follow the instructions. 
 
Take advantage of the Early Bird registration fees and save $100 – but you must register before 15 July.
 

Full registration details and fees are under “Registration” ( www.interspeech2008.org/registration )

 Interspeech is the world's largest and most comprehensive conference on Speech

Science and Speech Technology. We invite original papers in any related area,

including (but not limited to):

Human Speech Production, Perception and Communication;

Speech and Language Technology;

Spoken Language Systems;

Applications, Resources, Standardisation and Evaluation

Important Dates

Paper Submission: Monday, 7 April 2008, 3pm GMT

Notification of Acceptance/Rejection: Monday, 16 June 2008, 3pm GMT

Early Registration Deadline: 15 July 2008, 3pm GMT

Tutorial Day: Monday, 22 September 2008

Main conference: 23-26 September 2008

You will need a VISA

All INTERSPEECH international delegates (with the sole exception of citizens of New Zealand travelling on a New Zealand passport) will need a visa to enter Australia and this must be applied for in advance. Delegates from many countries can apply for an electronic visa, called an ETA (Electronic Travel Authority) for a modest fee of $20 and your application is likely to be approved instantly.  Currently citizens of the following countries can apply electronically at:  www.eta.immi.gov.au

Andorra, Austria , Belgium , Brunei , Canada , Denmark ,France, Finland, Germany , Greece, Hong Kong (SAR), Iceland, Ireland ,  Italy, Japan, Liechtenstein , Luxembourg , Malaysia , Malta, Monaco, Netherlands ,Norway , Portugal , San Marino , Singapore , South Korea, Spain,  Sweden , Switzerland,    Taiwan * UK **,  USA,  Vatican City. 

 If your country is not listed above, you will not be able to apply using the ETA system.  You will need to apply for a visa (a 456 Business Short Stay visa) at an Australian overseas office – for a list of offices go to: http://www.immi.gov.au/contacts/overseas/index.htm).  It is likely that you will be required to provide an INTERSPEECH conference registration confirmation letter with your visa application.  If you do not have one of these already, please write to megan@ccm.com.au and ask for a letter t be sent. The Australian Government’s Department of Immigration and Citizenship website provides full details on visas – please visit:  http://www.immi.gov.au/.

INTERSPEECH is being held in September, which is spring time in Australia! It’s a great time to visit sunny Brisbane and all of the attractions for which Australia is famous:  the Great Barrier Reef, the Gold and Sunshine Coasts, the Outback – they are all at their very best during spring.

See the INTERSPECH 2008 website   www.interspeech2008.org  Information about Brisbane and Australia and links to useful resources.

Start your travels to Australia right here: www.australia.com

BOOK YOUR INTERSPEECH ACCOMMODATION

INTERSPEECH 2008 has negotiated special discount accommodation rates at a variety of hotels and apartment hotels in Brisbane and you can make your booking when you complete the INTERSPEECH registration form.  Be warned, September is a busy time of year and we urge you to book your accommodation as soon as possible.

For a list of accommodation options, please go to the “Accommodation” sidebar on the INTERSPEECH 2008 website:  www.interspeech2008.org .

Chairman: Denis Burnham, MARCS, University of West Sydney. 

 

 

Back to Top

4-2 . (2009-09-06) INTERSPEECH 2009 Brighton UK

September 6-10, 2009, Brighton, UK,
Conference Website
Chairman: Prof. Roger Moore, University of Sheffield.

Back to Top

4-3 . (2010-09-26) INTERSPEECH 2010 Chiba Japan

Chiba, Japan
Conference Website
ISCA is pleased to announce that INTERSPEECH 2010 will take place in Makuhari-Messe, Chiba, Japan, September 26-30, 2010. The event will be chaired by Keikichi Hirose (Univ. Tokyo), and will have as a theme "Towards Spoken Language Processing for All - Regardless of Age, Health Conditions, Native Languages, Environment, etc."

Back to Top

4-4 . (2011-08-27) INTERSPEECH 2011 Florence Italy

Interspeech 2011

Palazzo dei Congressi,  Italy, August 27-31, 2011.

Organizing committee

Piero Cosi (General Chair),

Renato di Mori (General Co-Chair),

Claudia Manfredi (Local Chair),

Roberto Pieraccini (Technical Program Chair),

Maurizio Omologo (Tutorials),

Giuseppe Riccardi (Plenary Sessions).

More information www.interspeech2011.org

Back to Top

4-5 . (2008-08-25) ITRW on experimental linguistics

25-27 August 2008, Athens, Greece
Website
Prof. Antonis Botinis


Back to Top

4-6 . (2008-09-26) International Conference on Auditory-Visual Speech Processing AVSP 2008

Dates: 26-29 September 2008

Location: Moreton Island, Queensland, Australia
Website: http://express.hid.ri.cmu.edu/AVSP2008/Main.html

AVSP 2008 will be held as an ISCA Tutorial and Research Workshop at
Tangalooma Wild Dolphin Resort on Moreton Island from the 26-29
September 2008. AVSP 2008 is a satellite conference to Interspeech 2008,
being held in Brisbane from the 22-26 September 2008. Tangalooma is
located at close distance from Brisbane, so that attendance at AVSP 2008
can easily be combined with participation in Interspeech 2008.

Auditory-visual speech production and perception by human and machine is
an interdisciplinary and cross-linguistic field which has attracted
speech scientists, cognitive psychologists, phoneticians, computational
engineers, and researchers in language learning studies. Since the
inaugural workshop in Bonas in 1995, Auditory-Visual Speech Processing
workshops have been organised on a regular basis (see an overview at the
avisa website). In line with previous meetings, this conference will
consist of a mixture of regular presentations (both posters and oral),
and lectures by invited speakers.

Topics include but are not limited to:
- Machine recognition
- Human and machine models of integration
- Multimodal processing of spoken events
- Cross-linguistic studies
- Developmental studies
- Gesture and expression animation
- Modelling of facial gestures
- Speech synthesis
- Prosody
- Neurophysiology and neuro-psychology of audition and vision
- Scene analysis

Paper submission:
Details of the paper submission procedure will be available on the
website in a few weeks time.

Chairs:
Simon Lucey
Roland Goecke
Patrick Lucey


Back to Top

4-7 . (2008-12-15) Second IEEE Spoken Language Technology Workshop Goa

Second IEEE Spoken Language Technology Workshop
Goa, India
December 15-18, 2008

The Second IEEE Spoken Language Technology (SLT) workshop will be held from December 15 to December 18, 2008 in Goa, India. The goal of this workshop is to bring both the speech processing and natural language processing communities together to share and present recent advances in various areas of spoken language technology, with the expectation that such a confluence of the researchers from both communities will foster new ideas, collaborations and new research directions in this area. The SLT 2008 workshop is endorsed by both ISCA and ACL organizations and eligible participants can apply for ISCA grants (http://www.isca-speech.org/grants.html).

Spoken language technology is a vibrant research area, with the potential for significant impact on government and industrial applications especially with the diversity and challenges offered by the multilingual business climates of today's world.

The workshop solicits papers on all aspects of spoken language technology:

 o Spoken language understanding
 o Spoken document summarization
 o Machine translation for speech
 o Spoken dialog systems
 o Spoken language generation
 o Spoken document retrieval
 o Human computer Interactions (HCI)
 o Speech data mining
 o Information extraction from speech
 o Question answering from speech
 o Multimodal processing
 o Spoken language based assistive technologies
 o Spoken language systems and applications
 o Spoken language databases and standards

In addition, this year's workshop will feature three special sessions:

 1) Challenges in Asian spoken language processing with special emphasis on Indian languages
 2) Mining human-human conversations: A resource for building efficient human-machine dialogs
 3) Spoken Language on the go: Challenges and Opportunities for spoken language processing on mobile devices

Submissions for the Technical Program
-------------------------------------
The workshop program will consist of tutorials, oral and poster presentations, and panel discussions. Attendance will be limited with priority for those who will present technical papers; registration is required of at least one author for each paper. Submissions are encouraged on any of the topics listed above. The style guide, templates, and submission form will follow the IEEE ICASSP style. Three members of the Scientific Committee will review each paper. The workshop proceedings will be published on a CD-ROM.

Important Dates
---------------
*Camera-ready paper submission deadline: August 8, 2008
Hotel Reservation and Workshop registration opens: August 8, 2008
Paper Acceptance / Rejection: September 15, 2008
Hotel Reservation and Early Registration closes: October 5, 2008
Workshop: December 15-18, 2008*

For more information visit the SLT 2008 website http://slt2008.org or contact the organizing committee at info@slt2008.org <mailto:info@slt2008.org> if you have any questions.

Back to Top

5 . Books, databases and softwares

 

Back to Top

5-1 . Books

 This section shows recent books whose titles been have communicated by the authors or editors.

Also some advertisement for recent books in speech are included.

Book presentation is written by the authors and not by this newsletter editor or any  voluntary reviewer.

Back to Top

5-1-1 . La production de parole

La production de la parole
Author: Alain Marchal, Universite d'Aix en Provence, France
Publisher: Hermes Lavoisier
Year: 2007
 
 
Back to Top

5-1-2 . Speech enhancement-Theory and Practice

 
 Speech enhancement-Theory and Practice
Author: Philipos C. Loizou, University of Texas, Dallas, USA
Publisher: CRC Press
Year:2007
 
 
Back to Top

5-1-3 . Speech and Language Engineering

 
 
Speech and Language Engineering
Editor: Martin Rajman
Publisher: EPFL Press, distributed by CRC Press
Year: 2007
 
 
Back to Top

5-1-4 . Human Communication Disorders/ Speech therapy

 
 
Human Communication Disorders/ Speech therapy
This interesting series can be listed on Wiley website
 
 
Back to Top

5-1-5 . Incurses em torno do ritmo da fala

 
Incurses em torno do ritmo da fala
Author: Plinio A. Barbosa 
Publisher: Pontes Editores (city: Campinas)
Year: 2006 (released 11/24/2006)
(In Portuguese, abstract attached.) Website
 
 
Back to Top

5-1-6 . Speech Quality of VoIP: Assessment and Prediction

 
Speech Quality of VoIP: Assessment and Prediction
Author: Alexander Raake
Publisher: John Wiley & Sons, UK-Chichester, September 2006
Website
 
 
Back to Top

5-1-7 . Self-Organization in the Evolution of Speech, Studies in the Evolution of Language

 

Self-Organization in the Evolution of Speech, Studies in the Evolution of Language
Author: Pierre-Yves Oudeyer
Publisher:Oxford University Press
Website
 
 

 

Back to Top

5-1-8 . Speech Recognition Over Digital Channels

 
Speech Recognition Over Digital Channels
Authors: Antonio M. Peinado and Jose C. Segura
Publisher: Wiley, July 2006
Website
 
 
Back to Top

5-1-9 . Multilingual Speech Processing

 
Multilingual Speech Processing
Editors: Tanja Schultz and Katrin Kirchhoff ,
Elsevier Academic Press, April 2006
Website
 
 
Back to Top

5-1-10 . Reconnaissance automatique de la parole: Du signal a l'interpretation

 
 Reconnaissance automatique de la parole: Du signal a l'interpretation
Authors: Jean-Paul Haton
Christophe Cerisara
Dominique Fohr
Yves Laprie
Kamel Smaili
392 Pages Publisher: Dunod
 
 
 
 
Back to Top

5-1-11 . Automatic Speech Recognition on Mobile Devices and over Communication Networks

 
 Automatic Speech Recognition on Mobile Devices and over Communication 
Networks
*Editors: Zheng-Hua Tan and Børge Lindberg
Publisher: Springer, London, March 2008
website <http://asr.es.aau.dk/>
 
About this book
The remarkable advances in computing and networking have sparked an 
enormous interest in deploying automatic speech recognition on mobile 
devices and over communication networks. This trend is accelerating.
This book brings together leading academic researchers and industrial 
practitioners to address the issues in this emerging realm and presents 
the reader with a comprehensive introduction to the subject of speech 
recognition in devices and networks. It covers network, distributed and 
embedded speech recognition systems, which are expected to co-exist in 
the future. It offers a wide-ranging, unified approach to the topic and 
its latest development, also covering the most up-to-date standards and 
several off-the-shelf systems.
 
 
Back to Top

5-1-12 . Latent Semantic Mapping: Principles & Applications

Latent Semantic Mapping: Principles & Applications
Author: Jerome R. Bellegarda, Apple Inc., USA
Publisher: Morgan & Claypool
Series: Synthesis Lectures on Speech and Audio Processing
Year: 2007
Website: http://www.morganclaypool.com/toc/sap/1/1
 
 
 
Back to Top

5-1-13 . The Application of Hidden Markov Models in Speech Recognition

 
The Application of Hidden Markov Models in Speech Recognition By Mark Gales and Steve Young (University of Cambridge)
http://dx.doi.org/10.1561/2000000004
 
in Foundations and Tr=nds in Signal Processing (FnTSIG)
www.nowpublishers.com/SIG 
 
 
 
Back to Top

5-1-14 . Proc.of the IEEE Special Issue on ADVANCES IN MULTIMEDIA INFORMATION RETRIEVAL

Proceedings of the IEEE
 
Special Issue on ADVANCES IN MULTIMEDIA INFORMATION RETRIEVAL
 
Volume 96, Number 4, April 2008
 
Guest Editors:
 
Alan Hanjalic, Delft University of Technology, Netherlands
Rainer Lienhart, University of Augsburg, Germany
Wei-Ying Ma, Microsoft Research Asia, China
John R. Smith, IBM Research, USA
 
Through carefully selected, invited papers written by leading authors and research teams, the April 2008 issue of Proceedings of the IEEE (v.96, no.4) highlights successes of multimedia information retrieval research, critically analyzes the achievements made so far and assesses the applicability of multimedia information retrieval results in real-life scenarios. The issue provides insights into the current possibilities for building automated and semi-automated methods as well as algorithms for segmenting, abstracting, indexing, representing, browsing, searching and retrieving multimedia content in various contexts. Additionally, future challenges that are likely to drive the research in the multimedia information retrieval field for years to come are also discussed.
 
 
 
Back to Top

5-1-15 . Computeranimierte Sprechbewegungen in realen Anwendungen

Computeranimierte Sprechbewegungen in realen Anwendungen
Authors: Sascha Fagel and Katja Madany
102 pages
Publisher: Berlin Institute of Technology
Year: 2008
Website http://www.ub.tu-berlin.de/index.php?id=1843
To learn more, please visit the corresponding IEEE Xplore site at
http://ieeexplore.ieee.org/xpl/tocresult.jsp?isYear=2008&isnumber=4472076&Submit32=Go+To+Issue
Usability of Speech Dialog Systems
 
 
Back to Top

5-1-16 . Usability of Speech Dialog Systems:Listening to the Target Audience

Usability of Speech Dialog Systems
Listening to the Target Audience
Series: Signals and Communication Technology
 
Hempel, Thomas (Ed.)
 
2008, X, 175 p. 14 illus., Hardcover
 
ISBN: 978-3-540-78342-8
 
 
Back to Top

5-1-17 . Speech and Language Processing

Speech and Language Processing, 2nd Edition
 
By Daniel Jurafsky, James H. Martin
 
Published May 16, 2008 by Prentice Hall.
More Info
Copyright 2009
Dimensions 7" x 9-1/4"
Pages: 1024
Edition: 2nd.
ISBN-10: 0-13-187321-0
ISBN-13: 978-0-13-187321-6
Request an Instructor or Media review copy
Sample Content
An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. KEY TOPICS: Builds each chapter around one or more worked examples demonstrating the main idea of the chapter, usingthe examples to illustrate the relative strengths and weaknesses of various approaches. Adds coverage of statistical sequence labeling, information extraction, question answering and summarization, advanced topics in speech recognition, speech synthesis. Revises coverage of language modeling, formal grammars, statistical parsing, machine translation, and dialog processing. MARKET: A useful reference for professionals in any of the areas of speech and language processing.
  
 
 
Back to Top

5-2 . Database providers

 

Back to Top

5-2-1 . LDC News

 
 
 
 
-  Collaboration between LDC and Georgetown University Press  -
 
-  2008 Publications Pipeline Update  -
 
LDC2008S06
 
LDC2008T08
 
In this month's newsletter, the Linguistic Data Consortium (LDC) would like to report on recent developments and announce the availability of two new publications.


 

 
Collaboration between LDC and Georgetown University Press

 


LDC is pleased to announce that the U.S. Department of Education, International Education Programs Service, has funded a collaboration between LDC and Georgetown University Press (GUP) to create up-to-date lexical databases, with translations to and from English, for three dialects of colloquial Arabic. The databases will be used for interactive computer access and for new print publications of dictionaries in Iraqi, Syrian/Levantine and Moroccan dialects. 

The databases will be based on three GUP source dictionaries: A Dictionary of Iraqi Arabic, English-Arabic, Arabic-English (Clarity, et al., 2003), A Dictionary of Syrian Arabic, English-Arabic (Stowasser and Ani, 2004) and a Dictionary of Moroccan Arabic, Arabic-English, English-Arabic (Harrell and Sobelman, 2004). Utilizing contemporary principles of computational linguistics and current pedagogical requirements in order to reflect current vocabulary and usage, the work will provide a standardized system of transcription and use the Arabic script, both vocalized and unvocalized, to show vowel pronunciation as well as standard orthography. A searchable version on CD-ROM will accompany each print reference. The project has been funded for three years. Work will commence in Year 1 with the Iraqi Arabic dictionary, proceed to the Syrian/Levantine dictionary and conclude with the Moroccan Arabic dictionary.

The proposed dictionaries and databases aim to provide U.S. students and teachers of Arabic with current dialectal Arabic lexical information to enable them to communicate orally with native and non-native Arabic speakers. The scholarship used to create a modernized transcription system and to provide existing and new terms in Arabic script (including diacritics) may also help integrate instruction in dialect and Modern Standard Arabic by providing tools for curriculum developers.

 

 
2008 Publications Pipeline Update

 


To date, Membership Year (MY) 2008 has included a strong variety of publications including data used for the 2005 NIST LRE data, a new version of OntoNotes, and GALE parallel text data in Arabic and Chinese.  Please consult our corpus catalog for a full list of publications distributed by the LDC. As we have recently reached the half-way point of the year, we would like to provide information on our planned publications for the remainder of the year.  The pipeline for MY2008 includes the following: (Disclaimer:  unforeseen circumstances may lead to modifications of our plans.  Please regard this list as tentative).


• BLLIP 1994-1997 News Text Release 1 - automatic parses for the North American News Text Corpus - NANT (LDC95T21). The parses were generated by the Charniak and Johnson Reranking Parser which was trained on Wall Street Journal (WSJ) data from Treebank 3 (LDC99T42). Each file is a sequence of n-best lists containing the top n parses of each sentence with the corresponding parser probability and reranker score. The parses may be used in systems that are trained off labeled parse trees but require more data than found in WSJ. Two versions will be released: a complete 'Members-Only' version which contains parses for the entire NANT Corpus and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal.

• CALLHOME XML Mandarin POS Annotated Transcripts - contains the same 120 transcripts of telephone conversations in the LDC’s original release (LDC96T16 CALLHOME Mandarin Chinese Transcripts). The current version is marked up in the eXtensible Markup Language (XML), compliant with the Corpus Encoding Standard (CES) and character encoding has been transferred from the original GB2312 into Unicode (UTF-8).  This XML corpus has retained all of the linguistic analyses (e.g. timestamps, spoken features and proper nouns), but the mnemonics used in the original release have been migrated into XML markup.  In addition, this XML corpus has been re-tokenised and annotated with part-of-speech information.

• Czech Academic Corpus (CAC) Version 2.0 -contains manual morphological and syntactic annotation of approximately 650,000 words. As part of another project, the Prague Dependency Treebank (PDT), a large amount of Czech texts were annotated with morphological, syntactic, and complex semantic annotation, including certain properties of sentence information structure and coreference relations which were annotated at the semantic level. The idea of transferring the internal format and annotation scheme of the CAC into the PDT emerged during the work on the PDT’s second version (LDC2006T01 Prague Dependency Treebank 2.0). The main goal was to make the CAC and the PDT fully compatible and thus enable the integration of the CAC into the PDT. The second version adds surface syntax annotation termed 'analytical layer' of annotation. Software tools for corpus search, annotation and language analysis are provided.

• GALE Phase 1 Chinese Broadcast Conversation Parallel Text - contains transcripts of 20.4 hours of Chinese broadcast conversation with English translations from China Central TV and Phoenix TV programming. Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Files were translated according to the LDC's GALE Translation guidelines.

• LCTL Bengali Language Pack - a set of linguistic resources to support technological improvement and development of new technology for the Bengali language created in the Less Commonly Taught Languages (LCTL) project which covered a total of 13 languages. Package components are: 2.6 million tokens of monolingual text, 500,000 tokens of parallel text, a bilingual lexicon with 48,000 entries, sentence and word segmenting tools, an encoding converter, a part of speech tagger, a morphological analyzer, a named entity tagger and 136,000 tokens of named entity tagged text, a Bengali-to-English name transliterator, and a descriptive grammar. About 30,000 tokens of the parallel text are English-to-LCTL translations of a "Common Subset" corpus, which will be included in all additional LCTL Language Packs.

• North American News Text Corpus (NANT) Reissue - as a companion to BLLIP 1994-1997 News Text Release 1, LDC will reissue the North American News Text Corpus (LDC95T21). Data includes news text articles from several sources (L.A.Times/Washington Post, Reuters General News, Reuters Financial News, Wall Street Journal, New York Times) formatted with TIPSTER-style SGML tags to indicate article boundaries and organization of information within each article. Two versions will be released: a complete 'Members-Only' version which contains all previously released NANT articles and a 'Non Member' version for general licensing which includes all news text except data from the Wall Street Journal.


As a reminder, MY2007 will remain open for joining through December 31, 2008 and MY2008 through December 31, 2009.  Discounts on membership fees are still available for MY2007 members who renew their memberships.  Please see our Announcements page for complete details.

 

New Publications


(1) CSLU:  Alphadigit Version 1.3  is a collection of 78,044 utterances from 3,025 speakers saying six-digit strings of letters and digits over the telephone for a total of approximately 82 hours of speech. Each speech file has corresponding orthographic and phonemic transcriptions. This corpus was created by the Center for Spoken Language Understanding (CSLU), Oregon Health & Science University, Beaverton, Oregon.

Participants received a list of 18-29 six-digit strings (e.g., "a 2 b 4 5 g"); 1102 different strings were used throughout the course of the data collection. The lists were set up to balance for phonetic context between all letter and digit pairs. The data were recorded directly from a digital phone line without digital-to-analog or analog-to-digital conversion at the recording end using the CSLU T1 digital data collection system. The sampling rate was 8khz and the files were stored in 8-bit mu-law format on a UNIX file system. The files have been converted to RIFF standard file format, 16-bit linearly encoded.

All of the files included in this corpus have corresponding non-time-aligned word-level transcriptions and time aligned phoneme-level transcriptions (automatic forced alignment) that comply with the conventions in the CSLU Labeling Guide.   CSLU: Alphadigit Version 1.3 is distributed on 2 DVD-ROM.

2008 Subscription Members will automatically receive two copies of this corpus, provided that they have submitted a signed copy of the LDC User Agreement for CSLU Corpora. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$150.

*

(2) GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 contains transcripts and English translations of 22.9 hours of Chinese broadcast news programming from China Central TV (CCTV) and Phoenix TV. It does not contain the audio files from which the transcripts and translations were generated. GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 is the second of the three-part GALE Phase 1 Chinese Broadcast News Parallel Text, which, along with other corpora, was used as training data in year 1 (Phase 1) of the DARPA-funded GALE program. 

A total of 22.9 hours of Chinese broadcast news recordings were selected from two sources, CCTV (a broadcaster from Mainland China) and Phoenix TV (a Hong Kong based satellite TV station). The transcripts and translations represent recordings of five different programs.

A manual selection procedure was used to choose data appropriate for the GALE program, namely, news programs focusing on current events. Stories on topics such as sports, entertainment and stock markets were excluded from the data set.  Manual sentence units/segments (SU) annotation was also performed on a subset of files following LDC's Quick Rich Transcription specification. Three types of end of sentence SU were identified: statement SU, question SU, and incomplete SU. After transcription and SU annotation, they were reformatted into a human-readable translation format, and the files were then assigned to professional translators for careful translation. Translators followed LDC's GALE Translation guidelines, which describe the makeup of the translation team, the source, data format, the translation data format, best practices for translating certain linguistic features (such as names and speech disfluencies), and quality control procedures applied to completed translations.  GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 is distributed via web download.

2008 Subscription Members will automatically receive two copies of this corpus on disc. 2008 Standard Members may request a copy as part of their 16 free membership corpora. Nonmembers may license this data for US$1500.
 
 
 
 
 
 
 
 
Back to Top

5-2-2 . ELRA Ressource catalogue updates

ELRA is happy to announce that 1 new Speech Resource, produced within
the Technolangue programme, is now available in its catalogue.
*ELRA-S0272 MEDIA speech database for French
*The MEDIA speech database for French was produced by ELDA within the
French national project MEDIA (Automatic evaluation of man-machine
dialogue systems), as part of the Technolangue programme funded by the
French Ministry of Research and New Technologies (MRNT). It contains
1,258 transcribed dialogues from 250 adult speakers. The method chosen
for the corpus construction process is that of a =91Wizard of Oz=92 (WoZ)
 
system. This consists of simulating a natural language man-machine
dialogue. The scenario was built in the domain of tourism and hotel
reservation.
The semantic annotation of the corpus is available in this catalogue and
referenced ELRA-E0024 (MEDIA Evaluation Package).
For more information, see:=20
http://catalog.elra.info/product_info.php?products_id=3D1057
 
For more information on the catalogue, please contact Val=E9rie Mapelli
mailto:mapelli@elda.org
 
Visit our on-line catalogue: http://catalog.elra.info
<http://catalog.elra.info/>.
 
Back to Top

5-3 . MusicSpeech group

Music and speech share numerous aspects (language, structural, acoustics, cognitive), as long in their production, that in their representation and their perception. This list has for object to warn its users, various events dealing with the study of the links between music and speech. It thus intends to connect several communities, their allowing each to take advantage of a stimulating interaction.

As a member of the speech or music community, you are invited to
subscribe to musicspeech group. The group will be moderated and
maintained by IRCAM.

Group details:
* Name: musicspeech
* Home page: http://listes.ircam.fr/wws/info/musicspeech
* Email address: musicspeech@ircam.fr

Greg Beller, IRCAM,
moderator, musicspeech list

Back to Top

6 . Jobs openings

We invite all laboratories and industrial companies which have job offers to send them to the ISCApad editor: they will appear in the newsletter and on our website for free. (also have a look at http://www.isca-speech.org/jobs.html as well as http://www.elsnet.org/ Jobs)


Back to Top

6-1 . AT&T - Labs Research: Research Staff Positions - Florham Park, NJ

AT&T - Labs Research is seeking exceptional candidates for Research Staff positions. AT&T is the premiere broadband, IP, entertainment, and wireless communications company in the U.S. and one of the largest in the world. Our researchers are dedicated to solving real problems in speech and language processing, and are involved in inventing, creating and deploying innovative services. We also explore fundamental research problems in these areas. Outstanding Ph.D.-level candidates at all levels of experience are encouraged to apply. Candidates must demonstrate excellence in research, a collaborative spirit and strong communication and software skills. Areas of particular interest are               

  • Large-vocabulary automatic speech recognition
  • Acoustic and language modeling
  • Robust speech recognition
  • Signal processing
  • Speaker recognition
  • Speech data mining
  • Natural language understanding and dialog
  • Text and web mining
  • Voice and multimodal search

AT&T Companies are Equal Opportunity Employers. All qualified candidates will receive full and fair consideration for employment. More information and application instructions are available on our website at http://www.research.att.com/. Click on "Join us". For more information, contact Mazin Gilbert (mazin at research dot att dot com).

 

Back to Top

6-2 . Research Position in Speech Processing at Nagoya Institute of Technology,Japan

Nagoya Institute of Technology is seeking a researcher for a

post-doctoral position in a new European Commission-funded project

EMIME ("Efficient multilingual interaction in mobile environment")

involving Nagoya Institute of Technology and other five European

partners, starting in March 2008 (see the project summary below).

The earliest starting date of the position is March 2007. The initial

duration of the contract will be one year, with a possibility for

prolongation (year-by-year basis, maximum of three years). The

position provides opportunities to collaborate with other researchers

in a variety of national and international projects. The competitive

salary is calculated according to qualifications based on NIT scales.

The candidate should have a strong background in speech signal

processing and some experience with speech synthesis and recognition.

Desired skills include familiarity with latest spectrum of technology

including HTK, HTS, and Festival at the source code level.

For more information, please contact Keiichi Tokuda

(http://www.sp.nitech.ac.jp/~tokuda/).

About us

Nagoya Institute of Technology (NIT), founded on 1905, is situated in

the world-quality manufacturing area of Central Japan (about one hour

and 40 minetes from Tokyo, and 36 minites from Kyoto by Shinkansen).

NIT is a highest-level educational institution of technology and is

one of the leaders of such institutions in Japan. EMIME will be

carried at the Speech Processing Laboratory (SPL) in the Department of

Computer Science and Engineering of NIT. SPL is known for its

outstanding, continuous contribution of developing high-performance,

high-quality opensource software: the HMM-based Speech Synthesis

System "HTS" (http://hts.sp.nitech.ac.jp/), the large vocabulary

continuous speech recognition engine "Julius"

(http://julius.sourceforge.jp/), and the Speech Signal Processing

Toolkit "SPTK" (http://sp-tk.sourceforge.net/). The laboratory is

involved in numerous national and international collaborative

projects. SPL also has close partnerships with many industrial

companies, in order to transfer its research into commercial

applications, including Toyota, Nissan, Panasonic, Brother Inc.,

Funai, Asahi-Kasei, ATR.

Project summary of EMIME

The EMIME project will help to overcome the language barrier by

developing a mobile device that performs personalized speech-to-speech

translation, such that a user's spoken input in one language is used

to produce spoken output in another language, while continuing to

sound like the user's voice. Personalization of systems for

cross-lingual spoken communication is an important, but little

explored, topic. It is essential for providing more natural

interaction and making the computing device a less obtrusive element

when assisting human-human interactions.

We will build on recent developments in speech synthesis using hidden

Markov models, which is the same technology used for automatic speech

recognition. Using a common statistical modeling framework for

automatic speech recognition and speech synthesis will enable the use

of common techniques for adaptation and multilinguality.

Significant progress will be made towards a unified approach for

speech recognition and speech synthesis: this is a very powerful

concept, and will open up many new areas of research. In this

project, we will explore the use of speaker adaptation across

languages so that, by performing automatic speech recognition, we can

learn the characteristics of an individual speaker, and then use those

characteristics when producing output speech in another language.

Our objectives are to:

1. Personalize speech processing systems by learning individual

characteristics of a user's speech and reproducing them in

synthesized speech.

2. Introduce a cross-lingual capability such that personal

characteristics can be reproduced in a second language not spoken

by the user.

3. Develop and better understand the mathematical and theoretical

relationship between speech recognition and synthesis.

4. Eliminate the need for human intervention in the process of

cross-lingual personalization.

5. Evaluate our research against state-of-the art techniques and in a

practical mobile application.

 

Back to Top

6-3 . Speech and Natural Language Processing Engineer at M*Modal, Pittsburgh.PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated speech and natural language processing engineer who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • You will be working with other members of the R&D team to continuously improve our speech and natural language understanding technologies.
  • You will participate in designing and implementing algorithms, tools and methodologies in the area of automatic speech recognition and natural language processing/understanding.
  • You will collaborate with other members of the R&D team to identify, analyze and resolve technical issues.

Requirements:

  • Solid background in speech recognition, natural language processing, machine learning and information extraction.
  • 2+ years of experience participating in software development projects
  • Proficient with Java, C++ and scripting (e.g. Python, Perl, ...)
  • Excellent analytical and problem-solving skills
  • Integrate and communicate well in small R&D teams
  • Masters degree in CS or related engineering fields
  • Experience in a healthcare-related field a plus

 

In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

6-4 . Senior Research Scientist -- Speech and Natural Language Processing at M*Modal, Pittsburgh, PA,USA

M*Modal is a fast-moving speech technology company based in Pittsburgh, PA. Our portfolio of conversational speech recognition and natural language understanding technologies is widely recognized as the most advanced in the industry. We are a leading innovator in the field of conversational documentation services (CDS) - where speech recognition and natural language understanding are combined in a unique setup targeted to truly understand conversational speech and turn it directly into actionable and meaningful data. Our proprietary speech understanding technology - operating on M*Modal's computing grid hosted in our national data center - is already redefining the way clinical information is captured in healthcare.


We are seeking an experienced and dedicated senior research scientist who wants to push the frontiers of conversational speech understanding. Join our renowned research and development team, and add to our unique blend of scientific and engineering excellence.

Responsibilities:

  • Plan and perform research and development tasks to continuously improve a state-of-the-art speech understanding system
  • Take a leading role in identifying solutions to challenging technical problems
  • Contribute original ideas and turn them into product-grade software implementations
  • Collaborate with other members of the R&D team to identify, analyze and resolve technical issues

Requirements:

  • Solid research & development background with 3+ years of experience in speech recognition research, covering at least two of the following topics: speech processing, acoustic modeling, language modeling, decoding, LVCSR, natural language processing/understanding, speaker verification/identification, audio mining
  • Working knowledge of Machine Learning, Information Extraction and Natural Language Processing algorithms
  • 3+ years of experience participating in large-scale software development projects using C++ and Java.
  • Excellent analytical, problem-solving and communication skills
  • PhD with focus on speech recognition or Masters degree with 3+ years industry experience working on automatic speech recognition
  • Experience and/or education in medical informatics a plus
  • Working experience in a healthcare related field a plus

 


In June 2007 M*Modal moved to a great new office space in the Squirrel Hill area of Pittsburgh.  We are excited to be growing and are looking for individuals who have a passion for the work they do and are interested in becoming a member of a dynamic work group of smart passionate drivers who also know how to have fun.

 

M*Modal offers a top-notch benefits package that includes medical, dental and vision coverage, short-term disability, matching 401K savings plan, holidays, paid-time-off and tuition refund.  If you would like to be considered for this opportunity, please send your resume and cover letter to Mary Ann Gamble at maryann.gamble@mmodal.com

 

Back to Top

6-5 . PhD positions at Supelec,

Training Generative Bayesian Networks with Missing Data

Learning generative model parameters with missing data : application to user modelling for spoken dialogue systems optimization.

  

Description : 

Probabilistic models such as Bayesian Networks (BN) are widely used for reasoning under uncertainty about many domains. A BN is a graphical model that captures statistical properties of a data set in a parametric and compact representation. This representation can then be used to realise probabilistic inference about the domain from which the data were drawn. As any Bayesian method, Bayesian networks allow taking a priori knowledge into account so as to enhance the performance of the model or speed up the parameters learning process. They are part of a wider class of models called generative models because they also allow generating new data having similar statistical properties as those used for training the model. The purpose of this thesis is to develop new training algorithms so as to learn BN parameters from incomplete datasets ; that is datasets were some data are missing. Since the resulting models will be used to expand the training data set with statistically consistent samples, this may influence the parameters learning process.

 Application :

This thesis is proposed in the framework of a European project (CLASSiC) aiming at automatically optimising human-machine spoken interactions. Current learning methods applied to such a task require a large amount of spoken dialogue data that is not easy to gather and, above all, to annotate. It’s even more difficult if the spoken dialogue system is still in the design process. A widely adopted solution is to expand the existing datasets using probabilistic generative models that produce new samples of dialogues. Yet, the training sets are most often annotated from recorded or transcribed dialogues without additional information coming from the users. Their actual goal when using the system is often missing and difficult to infer from transcriptions. Moreover, none of the current solutions have proven to generate realistic dialogues in term of goal consistency for instance. Training models considering the users goal as missing so as to generate realistic dialogues will be the objective.

 Context :

The PhD student will participate to a European project (CLASSiC) funded by the FP7 ICT program of the European Commission. The CLASSiC consortium includes Supélec (French engineering school), the universities of Edinburgh, Cambridge and Geneva as well as France Télécom (French telecom operator). The selected candidate will be hosted on the Metz campus of Supélec and will join the IMS research group.

 Profile

The candidate should hold a Master or Engineering degree in computer science or signal processing, with knowledge in machine learning and good skills in C++ programming. English speaking is required ; French speaking would be a plus.

 Contact : Olivier Pietquin (olivier.pietquin@supelec.fr)

 

 

Bayesian Methods for Generalization in Reinforcement Learning

   

Bayesian methods for generalization and direct policy search in reinforcement learning : application to spoken dialogue systems optimization.

  

Description : 

Reinforcement Learning (RL) is an on-line machine learning paradigm that aims at finding optimal policies to control complex stochastic dynamical systems. RL is typically a good candidate to replace heuristically-driven control policies because of its ability to learn continuously from experiences so as to maximize a utility function. It has proven its efficiency at finding optimal control policies in the case of discrete systems (discrete state and action spaces, as well as discrete time). Yet, most of real-world problems are continuous or hybrid in states and actions or their state space is big enough to be approximated by a continuous space. Designing realistic reinforcement learning algorithms for handling such problems is still research. Policy generalization by means of supervised learning is promising. Yet the optimal policy, or any related function, cannot be known accurately while learning and standard off line regression is therefore not suitable since new information is gathered while interacting with the system. So a critical issue is to build a generalization method, suitable for policy evaluation, able to update its parameters on-line from uncertain observations. In addition, uncertainty should be managed carefully, and thus estimated all along the learning process, so as to avoid generating hazardous policies while exploring optimally the policy space. Bayesian filtering is proposed as a possible framework to tackle this problem because of its inherent adequacy to learning under uncertainty. Particularly, it is proposed to make use of Bayesian filters to search directly in the policy space.

 Application :

This thesis is proposed in the framework of a European project (CLASSiC) aiming at automatically optimising human-machine spoken interactions. Current learning methods applied to such a task require a large amount of spoken dialogue data that is not easy to gather and, above all, to annotate. It’s even more difficult if the spoken dialogue system is still in the design process. Generalizing policies to handle interactions that are not cannot be found in collected database is therefore necessary. In addition, call centres are used by millions of persons every year. New information should therefore be available after the system has been released and should be used to enhance its performance. This is why on-line learning is crucial.

 Context

The PhD student will participate to a European project (CLASSiC) funded by the FP7 ICT program of the European Commission. The CLASSiC consortium includes Supélec (French engineering school), the universities of Edinburgh, Cambridge and Geneva as well as France Télécom (French telecom operator). The selected candidate will be hosted on the Metz campus of Supélec and will join the IMS research group.

 Profile

The candidate should hold a Master or Engineering degree in computer science or signal processing, with knowledge in machine learning and good skills in C++ programming. English speaking is required ; French speaking would be a plus.

 Contacts : 

Hervé Frezza-Buet (herve.frezza-buet@supelec.fr)  
Back to Top

6-6 . PhD position at Orange Lab

* Position : PhD, 3 years
* Research Area : speech synthesis, prosody modelling
* Location : Orange Labs, Lannion, France
* Start date: Openings Immediate.
* Summary:=20
The emergence of corpus-based technologies allowed major improvements in 
Text-to-Speech (TTS) during the last decade. Such systems can produce 
very natural synthetic sentences, almost undistinguishable from natural 
speech. Synthetic prompts can now replace human recordings in some 
commercial applications, like IVR services. However their use remains 
delicate due to the lack of prosody control (intonation, rhythm...). The 
aim of the project is to provide the user with a support tool for easily 
specifying the prosody of the synthesized speech.
 
The work will focus on characterising essential prosodic elements needed 
for expressive speech synthesis, possibly restricted to a specific 
application domain. The chosen typology will have to match the prosody 
of the TTS corpora as accurately as possible, through a relevant set of 
prosodic primitives. The robustness of the topology is critical for 
automatic annotation of the databases.
The work will also address ergonomics -how to propose to the user a 
convenient way to specify prosody- and will be closely related to the 
signal production techniques -signal processing and/or unit selection.
 
 
* Research Lab:
The PhD will be hosted in the Speech Synthesis team at Orange Labs. 
Orange Labs develop a state-of-the-art corpus-based speech synthesizer 
(demonstrator available on http://tts.elibel.tm.fr).
 
 
* Requirements:
The candidate has a (research) master in Computer Science or Electrical 
Engineering. The candidate has a strong interest in doing research, 
excellent writing skills in French or English and good programming 
skills. Knowledge in speech processing or automatic classification is a 
plus.
 
 
* Contacts:
For more information please contact:
- Cedric Boidin, cedric.boidin@orange-ftgroup.com, +33 2 96 05 33 53
- Thierry Moudenc, thierry.moudenc@orange-ftgroup.com, +33 2 96 05 16 59
 
Back to Top

6-7 . Professeur a PHELMA du Grenoble INP (in french)

Un poste de Professeur des universités 61e section à l'école PHELMA
du Grenoble INP est ouvert au concours pour la rentrée 2008. Les profils 
enseignement et recherche sont décrits ds la fiche de poste ci-jointe.
   Le profil recherche a été défini par le département "Parole et 
Cognition" de GIPSA-Lab. L'équipe "Machines Parlantes, Agents 
Conversationnels & Interaction Face-à-face" du département est 
particulièrement ciblée par le projet d'intégration, bien que le projet 
puisse concerner d'autres équipes. Vous trouverez le descriptif des 
thèmes de recherche de GIPSA-lab, du département et de ses équipes ainsi 
que les contacts appropriés sur http://www.gipsa-lab.inpg.fr. Merci de 
prendre contact avec la direction du département pour tout renseignement 
complémentaire.
 
   Gerard BAILLY, directeur-adjoint du GIPSA-Lab
 
Back to Top

6-8 . PhD positions at GIPSA (formerly ICP) Grenoble France

Laboratory: GIPSA-lab, Speech & Cognition Dept.
Address : ENSIEG, Domaine Universitaire - BP46, 38402 Saint Martin d'Hères
Thesis supervisor: Pierre Badin
e-mail address: Pierre.Badin@gipsa-lab.inpg.fr
Co- supervisor(s): Gérard Bailly
Title: Control of talking heads by multimodal inversion – Application to language learning
and rehabilitation
Context and problem :
Speech production necessitates fairly precise control of the various orofacial articulators (jaw, lips,tongue, velum, cheeks, etc.). Regulating these gestures implies that a fairly precise feedback about his / her vocal production is available to the speaker. Auditory feedback is essential and its degradation can generate degradation, if not total loss, of speech production capabilities. In fact, the perception of the acoustic consequences of articulatory gestures can be degraded in different ways: either peripherically through the degradation, if not the complete loss, of this feedback (deaf and hearing impaired people, implanted or not), either in a more central way through the loss of sensitivity to phonological contrasts due to phonological deafness (contrasts not exploited in the mother language: i.e. Japanese speakers have extreme difficulties producing the /l/ vs. /r/ contrast not exploited in their mother language).
The stake of this doctoral work is to explore the speakers’ abilities to exploit a virtual multisensory
feedback that complements, if not substitutes for, the failing auditory feedback. The virtual
feedback that will be designed and studied in this framework will be provided by a talking head (see on the right in 2D or 3D) that reproduces in an augmented reality mode – in real time or offline – the articulation of a sound for which only the acoustical and / or visual signal is available.
The thesis challenge is to design and assess a robust system that can estimate the articulation from its sensory consequences and in particular that deals with the normalisation problem (establishing the correspondence between the audiovisual spaces of the talking head and of the speaker), and then to quantify the benefit that an hearing impaired person or a second language learner can gain from a restored sensory motor feedback loop.
 
---------------------------------------------------------------------------------------------------------------------------------
Multimodality for face-to-face interaction between an embodied conversational agent and a human
partner: experimental software platform
Thesis financed by a research grant from Rhône-Alpes region - 1750€ gross/month
Selected in 2008 by the research Cluster ISLE (http://www.grenoble-universites.fr/isle)
The research work aims at developing multimodal systems enabling an
embodied conversational agent and a human partner to engage into a
situated face-to-face conversation notably involving objects of the
environment. These interactive multimodal systems involve numerous
software sensors and actuators such as recognizing/synthesizing speech,
facial expressions, gaze or gestures of the interlocutors. The environment
in which this interaction occurs should also be analyzed so that to
maintain or attract attention towards objects of interest in the dialog.
Perception-action loops of these multimodal systems should finally take into account the mutual
conditioning of the cognitive states of the interlocutors as well as the psychophysical, linguistic and social
dimensions of these multimodal turns.
In this context and due to the complexity of the signal and information processing to implement, the
objective of this work is first to conceive and implement a wizard-of-Oz software platform for exploring
the conception space by simulating parts of this interactive system by a human accomplice while other
parts are taken in charge by automatic behavior. The first objective of the work is to study the impact of
this faked versus automatic behavior on the interactions in terms of cognitive load, subject’s satisfaction
or task performance. The final objective is of course to progressively substitute to human intelligence and
comprehension of the scene an autonomous context-sensitive and context-aware interactive system.
The software platform should warrant real-time processing of perceived and generated multimodal events
and should provide the wizard-of-Oz with tools that are adequate and intuitive for controlling the part of
the simulated behavior of the system.
This thesis will be conducted in the framework of the OpenInterface european project (FP6-IST-35182 on
multimodal interaction) and the ANR project Amorces (human-robot collaboration for manipulating
objects).
Expected results
Experimental:
• Prototype of the Wizard-of-Oz platform
• Recordings of multimodal conversations between an embodied conversational agent and a human
partner using the prototype
Theoretical :
• Taxonomy of Wizard-of-Oz platforms
• Design of real-time Wizard-of-Oz platforms
• Highly modular software model of multimodal systems
• Multi-layered model of face-to-face conversation
Keywords
Interaction model, multimodality, multimodal dialog, interaction engineering, software architecture,
Wizard-of-Oz platform
Thesis proposed by
Gérard BAILLY, GIPSA-Lab, MPACIF team Gerard.Bailly@gipsa-lab.inpg.fr
Laurence NIGAY, LIG, IIHM team Laurence.Nigay@imag.fr
Doctoral program: EEATS GRENOBLE – FRANCE http://www.edeeats.inpg.fr/
Back to Top

6-9 . PhD in speech signal processing at Infineon Sophia Antipolis

Open position: PhD in speech signal processing

 

 

Title: Solutions for non-linear acoustic echo.

 

Background:

Acoustic echo is an annoying disturbance due to the sound feedback between the loudspeaker and the microphone of terminals. Acoustic echo canceller and residual echo cancellation are widely used to reduce the echo signal. The performance of existing echo reduction systems strongly relies on the assumption that the echo path between transducers is linear. However, today’s competitive audio consumer market may favour sacrificing linear performance for the integration of low cost analogue components. The assumption of linearity is not hold anymore, due to the nonlinear distortions introduced by the loudspeakers and the small housing where transducers are placed.

 

Task:

The PhD thesis will lead research in the field of non-linear system applied for acoustic echo reduction. The foreseen tasks deal first with proper modelling of mobile phone transducers presenting non-linearity, to get a better understanding in which environment echo reduction works. Using this model as a basis, study of performance of linear system will permit to get a good understanding on the problems brought by non-linearity. In a further step, the PhD student will develop and test non-linear algorithms coping with echo cancellation in non-linear environment.

About the Company:

Sophia-Antipolis site is one of the main Infineon Technologies research and development centers worldwide. Located in the high-tech valley of Sophia-Antipolis, near Nice in the south of France, a team of 140 experienced research and development engineers specialized in Mobile Solutions, Embedded SRAM, and Design-Flow Software. The PhD will take place within the Mobile Solution group which is responsible for specifying and designing baseband integrated circuits for cellular phones. The team is specialized in innovative algorithm development, especially in audio, system specification and validation, circuit design and embedded software. Its work makes a significant contribution to the Infineon Technologies wireless chipset portfolio.

Required skills:

-        Master degree

-        Strong background in signal processing.

-        Background in speech signal or non-linear system processing is a plus.

-        Programming: Matlab, C.

-        Knowledge in C-fixed point / DSP implementation is a plus.

-        Language: English

Length of the PhD: 3 years

Place: Infineon Technologies France, Sophia-Antipolis

Contact:

Christophe Beaugeant

Phone: +33 (0)4 92 38 36 30

E-mail : christophe.beaugeant@infineon.com

Back to Top

6-10 . PhD position at Institut Eurecom Sophia Antipolis France

Institut Eurécom, Sophia Antipolis, France
Doctoral Position
 
Title: Speaker Diarisation for Internet-based Content Processing
 
Department: Multimedia Communications
URL: http://www.eurecom.fr/research/
Start Date: Immediate vacancy
Duration: 3 years
 
Description: Also known as the “who spoke when?” task, speaker diarization aims to detect the 
number of speakers within an audio document and to identify when each speaker is active. Speaker
diarization is an important problem with applications in speaker indexing, document retrieval,
rich transcription, speech and speaker recognition/biometrics and video conferencing, among
others. Research to date has focused on narrow application domains, namely telephone
speech, broadcast news and meeting recordings. In line with recent shifts in the field, this
research project will explore exciting new applications of speaker diarization in the area of
Internet-based content processing, especially user-generated content. The diversity of such
content presents a number of new challenges. Some areas in which the candidate will be
expected to work involve speech enhancement / noise compensation, beam-forming, speech
activity detection, channel compensation and statistical speaker modelling. The successful
candidate will have the opportunity for international travel and to become involved in
national and European projects and internationally competitive speaker diarization trials.
This position offers a unique opportunity to develop broad knowledge in cutting edge speech
and audio processing research.
 
Requirements: The successful candidate will have a Master’s degree in engineering, mathematics,
computing, physics or a related relevant discipline. You will have strong mathematical,
programming and communication skills and be highly motivated to undertake challenging
research. Good English language speaking and writing skills are essential.
 
Applications: Please send to the address below (i) a one page statement of research interests and
motivation, (ii) your CV and (iii) three letters of reference (2 academic, 1 personal).
Contact: Nicholas Evans
Postal Address: 2229 Route des Crêtes BP 193, F-06904 Sophia Antipolis cedex, France
Email address: nicholas.evans@eurecom.fr
Web address: http://www.eurecom.fr/main/institute/job.en.htm
Phone: +33/0 4 93 00 81 14
Fax: +33/0 4 93 00 82 00
 
Institut Eurécom is located in Sophia Antipolis, a vibrant science park on the French Riviera. It 
is in close proximity with a large number of research units of leading multi-national corporations 
in the telecommunications, semiconductor and biotechnology sectors, as well as other outstanding 
research and teaching institutions. A freethinking, multinational population and the unique 
geographic location provide a quality of life without equal.
 
Institut Eurécom, 2229 Route des Crêtes BP 193, F-06904 Sophia Antipolis cedex, France
www.eurecom.fr
 
Back to Top

6-11 . Two PhD's positions at the University of Karlsruhe Germany

At the Institut für Theoretische Informatik, Lehrstuhl Prof. Waibel Universität Karlsruhe (TH) a

Ph.D. position

in the field of

 

Software System Integration of Automatic Speech Recognition and Machine Translation for Speech based Multimedia Indexing

 

has to be filled immediately with a salary according to TV-L, E13.

 

The responsibilities include integration, fusion and development of core technologies in the area of automatic speech recognition, simultaneous machine translation, in the context of speech based indexing of multimedia documents within application targeted research projects in the area of multimodal Human-machine interaction.  Set in a framework of internationally and industry funded research programs, the successful candidate is expected to contribute to showcases for state-of-the art of modern recognition and translation systems.

 

We are an internationally renowned research group with an excellent infrastructure. Examples of our projects for improving Human-Machine and Human-to-Human interaction are: JANUS - one of the first speech translation systems proposed, simultaneous translation of lectures, portable speech translators, meeting browser and lecture tracker.

 

Within the framework of the International Center for Advanced Communication Technology (interACT), our institute operates in two locations, Universität Karlsruhe (TH), Germany and at Carnegie Mellon University, Pittsburgh, USA.  International joint and collaborative research at and between our centers is common and encouraged, and offers great international exposure and activity. 

 

Applicants are expected to have:

  • an excellent university degree (M.S, Diploma or Ph.D.) in Computer Science, Electrical Engineering, Mathematics, or related fields
  • excellent programming skills 
  • advanced knowledge in at least one of the fields of Machine Learning, Pattern Recognition, Statistics, or System Integration

 

For candidates with Bachelor or Master’s degrees, the position offers the opportunity to work toward a Ph.D. degree.

 

In line with the university's policy of equal opportunities, applications from qualified women are particularly encouraged. Handicapped applicants will be preferred in case of the same qualification.

 

Questions may be directed to: Sebastian Stüker, Tel. +49 721 608 6284, E-Mail: stueker@ira.uka.de,  http://isl.ira.uka.de

 

The application should be sent to Professor Waibel, Institut für Theoretische Informatik, Universität Karlsruhe (TH), Adenauerring 4, 76131 Karlsruhe, Germany

 

----------------------------------------------------------------------------------------------------------------------------------

 

 

 

At the Institut für Theoretische Informatik, Lehrstuhl Prof. Waibel Universität Karlsruhe (TH) a

 

 

Ph.D. position

in the field of

Multimodal Dialog Systems

 

Is to be filled immediately with a salary according to TV-L, E13.

 

The responsibilities include basic research in the area of multimodal dialog systems, especially multimodal human-robot interaction and learning robots, within application targeted research projects in the area of multimodal Human-machine interaction.  Set in a framework of internationally and industry funded research programs, the successful candidate(s) are expected to contribute to the state-of-the art of modern spoken dialog systems, improving natural interaction with robots.

 

We are an internationally renowned research group with an excellent infrastructure. Current research projects for improving Human-Machine and Human-to-Human interaction are focus on dialog management for Human-Robot interaction.

 

Within the framework of the International Center for Advanced Communication Technology (interACT), our institute operates in two locations, Universität Karlsruhe (TH), Germany and at Carnegie Mellon University, Pittsburgh, USA.  International joint and collaborative research at and between our centers is common and encouraged, and offers great international exposure and activity. 

 

Applicants are expected to have:

  • an excellent university degree (M.S, Diploma or Ph.D.) in Computer Science, Computational Linguistics, or related fields
  • excellent programming skills 
  • advanced knowledge in at least one of the fields of Speech and Language Processing, Pattern Recognition, or Machine Learning

 

For candidates with Bachelor or Master’s degrees, the position offers the opportunity to work toward a Ph.D. degree.

 

In line with the university's policy of equal opportunities, applications from qualified women are particularly encouraged. Handicapped applicants will be preferred in case of the same qualification.

 

Questions may be directed to: Hartwig Holzapfel, Tel. +49 721 608 4057, E-Mail: hartwig@ira.uka.de,  http://isl.ira.uka.de

 

The application should be sent to Professor Waibel, Institut für Theoretische Informatik, Universität Karlsruhe (TH), Adenauerring 4, 76131 Karlsruhe, Germany

Back to Top

6-12 . Job opening at TFH Berlin University of Applied Sciences, Department of Computer Sciences and Media, Germany

Job opening at TFH Berlin University of Applied Sciences, Department of Computer Sciences and Media, Germany: Post-graduate position (part-time) for a computer scientist or engineer with a background in ASR and/or TTS in a three-year project in Computer-Aided Language Learning funded by the German Ministry of Education and Research. Start: 1 July 2008. The task will be the development and evaluation of a software system for teaching Mandarin pronunciation to Germans, as well as administrative duties with the funding body. Knowledge of E-Learning applications, German and/or Mandarin are welcome, good English skills mandatory. Candidates will have the opportunity to pursue a PhD degree and should be preferably EU citizens. The position is paid according to BAT 2a/2 (German pay scale for federal employees), about €28.000/year depending on age and marital status. Please direct further enquiries to Prof. Dr. Hansjörg Mixdorff at mixdorff@tfh-berlin.de. 

Back to Top

6-13 . Offre d' Allocation de Recherche - Rentree Universitaire 2008 (in french)

Offre d’Allocation de Recherche – Rentrée Universitaire 2008

Les cartes sensorimotrices de la parole: Corrélats neuroanatomiques des systèmes de perception et de production des voyelles et consonnes du Français.

Marc Sato, Chargé de Recherche CNRS

Jean-Luc Schwartz, Directeur de Recherche CNRS

GIPSA-Lab, UMR CNRS 5216, Département Parole et Cognition, Equipe "Perception, Multimodalité, Développement", Grenoble France (http://gipsa-lab.inpg.fr).

Pour la rentrée universitaire 2008, nous proposons une Allocation Doctorale de Recherche dans le cadre de l’Ecole Doctorale Ingénierie pour la santé, la Cognition et l’Environnement, habilitée par les universités Pierre Mendès France, Joseph Fourier et l’Institut National Polytechnique de Grenoble (EDISCE – ED216, http://www-sante.ujf-grenoble.fr/edisce/).

Dans le cadre théorique d’un possible couplage fonctionnel entre systèmes de perception et de production de la parole, ce projet de recherche a pour but de tester l’existence de connectivités dynamiques fonctionnelles spécifiques entre régions sensorielles et motrices lors de la perception et de la production des voyelles et consonnes du Français. Dans ce but, une série d’expériences en imagerie par résonance magnétique fonctionnelle (IRMf) et en électro-encéphalographie (EEG) devra permettre une description spatiale et temporelle précise des activités cérébrales impliquées lors de la production et la perception des phonèmes du français ainsi que de la connectivité dynamique entre ces régions. Les expériences menées devraient permettre une compréhension approfondie des processus d’analyse et de construction des représentations verbales par la mise en évidence d’une co-structuration et dépendance des régions sensorielles et motrices.

 

Outre une recherche bibliographique approfondie du corpus de la littérature, multidisciplinaire, dans les domaines de la phonétique et de la phonologie, de la neuropsycholinguistique et des neurosciences cognitives, ce projet comprendra l’élaboration de protocoles expérimentaux en IRMf et EEG, la passation des sujets et le recueil des données, enfin l’analyse statistique des données et leur interprétation.

 

Le(a) candidat(e) aura de préférence un M2R en neurosciences, sciences cognitives, psychologie cognitive ou neuropsychologie. Le candidat devra être familier avec l’expérimentation comportementale classique ainsi que les tests et analyses statistiques appliqués en psychologie cognitive. La pratique de l’anglais et une première expérience avec les techniques d’IRMf et/ou d’EEG est souhaitable.

 

Ce travail de thèse sera inscrit dans le contexte d’un projet de recherche grenoblois portant sur la mise en œuvre de nouvelles techniques d’analyses et de modélisation non-linéaire de mesure de la connectivité cérébrale. Il s’appuiera également sur des collaborations nationales et internationales, notamment avec le Centre de Recherche sur le Langage, l’Esprit et le Cerveau de l’Université McGill (Canada). Le financement est pour une période de trois ans, à compter d’Octobre 2008.

 

Contact pour cette annonce:

 

Marc Sato

GIPSA-Lab, UMR CNRS 5216

Université Stendhal

BP 25 - 38040 Grenoble cedex 9

Tel: (+33) (0)476 827 784

Fax: (+33) (0)476 824 335

E-mail: marc.sato [at] gipsa-lab.inpg.fr

 

La date limite pour envoyer un CV détaillé est fixée au 18 juin. Dès que possible, envoyez également les notes finales et classement au M2R, et éventuellement une lettre de recommandation.

 

Back to Top

6-14 . Theses de l' ecole doctorale MITT, Universite Paul Sabatier Toulouse III (mainly in french)

Thèse : Caractérisation et l'identification automatique de dialectes These DeadLine: 10/06/2008 jerome.farinas@irit.fr http://www.irit.fr/recherches/SAMOVA/these-caracterisation-et-lidentification-automatique-de-dialectes.html Description du sujet Le recherche en traitement automatique de la parole s'intéresse de plus en plus au traitement de grandes collections de données, dans des conditions de parole spontanée et conversationnelle. Les performances sont dépendantes de toutes les variabilités de la parole. Une de ces variabilités concerne l'appartenance dialectale du locuteur, qui induit de la variabilité tant au niveau de la prononciation phonétique, mais également au niveau de la morphologie des mots et de la prosodie. Nous proposons de réaliser un sujet de recherche sur la caractérisation automatique dialectale des locuteurs, en vue de diriger l'adaptation des systèmes de reconnaissance de la parole : la sélection de modèles acoustiques et prosodiques adaptées permettront d'améliorer des performances dans des conditions de reconnaissance indépendante du locuteur. La réalisation d'un tel système devra s'appuyer sur les avancées récentes en identification de la langue au niveau de la modélisation acoustique par exp! loration de réseaux de phonèmes et proposer une modélisation fine basée sur la micro et macro prosodie. Les bases de données disponibles au sein du projet sur la phonologie du français contemporain (http://www.projet-pfc.net/) permettront de disposer d'un large éventail de données sur les variances de prononciation. Le système final sera évalué lors des campagnes internationales organisées par le NIST sur la vérification de la langue, qui prennent maintenant en compte les variances dialectales (mandarin, anglais, espagnol et hindi) : http://www.nist.gov/speech/tests/lre/. English version The research in automatic speech processing is increasingly concerned at the treatment of large data collections, with spontaneous and conversational speech. The variability of speech alter the general performances. The dialect of the speaker is of these variability sources, who leads in alterations in terms of both the phonetic pronunciation, but also in terms of the morphology of words and prosody. We propose to conduct a research on the automatic characterization of the dialects, in order to adapt automatic speech recognition systems: by selection of acoustic and prosodic models suited to improve performance in speaker independent recognition conditions. The realization of such a system should be based on recent advances in the identification of the language in the exploration of phonemes modeling lattices and propose a fine modelling based on micro and macroprosody. The databases available within the PFC project (http://www.projet-pfc.net/) will provide a wide range of d! ata on variances pronunciation. The final system will be evaluated during international campaigns conducted by the NIST on language verification, which now take into account the dialect identification (Mandarin, English, Spanish and Hindi): http://www.nist.gov/speech/tests/lre/. Connaissances et compétences requises * compétences en informatique (en particulier traitement automatique de la parole) * compétences en linguistique (phonologie, prosodie) Contact Un financement sera attribué aux meilleurs candidats de thèse de l'Ecole Doctorale, il faut donc nous contacter avant le 10 juin pour pouvoir participer à ce classement.

 

Thèse de l’école doctorale MITT, Université Paul Sabatier Toulouse III

DeadLine: 10/06/2008

Contacts: jerome.farinas@irit.fr

http://www.irit.fr/-Equipe-SAMoVA-

 

DESCRIPTION DU SUJET :

Le recherche en traitement automatique de la parole s'intéresse de plus en plus au traitement de grandes collections de données, dans des conditions de parole spontanée et conversationnelle. Les performances sont dépendantes de toutes les variabilités de la parole. Une de ces variabilités concerne l'appartenance dialectale du locuteur, qui induit de la variabilité tant au niveau de la prononciation phonétique, mais également au niveau de la morphologie des mots et de la prosodie. Nous proposons de réaliser un sujet de recherche sur la caractérisation automatique dialectale des locuteurs, en vue de diriger l'adaptation des systèmes de reconnaissance de la parole : la sélection de modèles acoustiques et prosodiques adaptées permettront d'améliorer des performances dans des conditions de reconnaissance indépendante du locuteur. La réalisation d'un tel système devra s'appuyer sur les avancées récentes en identification de la langue au niveau de la modélisation acoustique par exploration de réseaux de phonèmes et proposer une modélisation fine basée sur la micro et macro prosodie. Les bases de données disponibles au sein du projet sur la phonologie du français contemporain (http://www.projet-pfc.net/) permettront de disposer d'un large éventail de données sur les variances de prononciation. Le système final sera évalué lors des campagnes internationales organisées par le NIST sur la vérification de la langue, qui prennent maintenant en compte les variances dialectales (mandarin, anglais, espagnol et hindi) : http://www.nist.gov/speech/tests/lre/.

 

ENGLISH VERSION:

The research in automatic speech processing is increasingly concerned at the treatment of large data collections, with spontaneous and conversational speech. The variability of speech alter the general performances. The dialect of the speaker is of these variability sources, who leads in alterations in terms of both the phonetic pronunciation, but also in terms of the morphology of words and prosody. We propose to conduct a research on the automatic characterization of the dialects, in order to adapt automatic speech recognition systems: by selection of acoustic and prosodic models suited to improve performance in speaker independent recognition conditions. The realization of such a system should be based on recent advances in the identification of the language in the exploration of phonemes modeling lattices and propose a fine modelling based on micro and macroprosody. The databases available within the PFC project (http://www.projet-pfc.net/) will provide a wide range of data on variances pronunciation. The final system will be evaluated during international campaigns conducted by the NIST on language verification, which now take into account the dialect identification (Mandarin, English, Spanish and Hindi): http://www.nist.gov/speech/tests/lre/.


CONNAISSANCES ET COMPETENCES REQUISES

 

 

    • compétences en informatique (en particulier traitement automatique de la parole)

    • compétences en linguistique (phonologie, prosodie)

Back to Top

6-15 . Cambridge University Research Position in Speech processing

Cambridge University: Research Position in Speech Synthesis and Recognition / Machine Translation


A position exists for a Research Associate to work on the EMIME ("Efficient multilingual interaction in mobile environment") project. This project is funded by the European Commission within the FP7 programme. The project aims to develop a mobile device that performs personalized speech-to-speech translation such that a user's spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user's voice. We will build on recent developments in speech synthesis using hidden Markov models, which is the same technology used for automatic speech recognition. Using a common statistical modelling framework for automatic speech recognition and speech synthesis will enable the use of common techniques for adaptation and multilinguality. The project objectives are to

1. Personalise speech processing systems by learning individual characteristics of a user's speech and reproducing them in synthesised speech.
2. Introduce a cross-lingual capability such that personal characteristics can be reproduced in a second language not spoken by the user.
3. Develop and better understand the mathematical and theoretical relationship between speech recognition and synthesis.
4. Eliminate the need for human intervention in the process of cross-lingual personalisation.
5. Evaluate our research against state-of-the art techniques and in a practical mobile application.
See the EMIME website for more information: http://www.emime.org/

This is an opportunity to work in a research group with a world-leading reputation in speech recognition and statistical machine translation research. There are excellent opportunities for publications, travel and conference visits. The group has outstanding research facilities. For suitably qualified candidates there may also be the chance to contribute to the MPhil in Computer Speech, Text and Internet Technology (http://mi.eng.cam.ac.uk/cstit/).

The successful candidate must have a very good first degree in a relevant discipline and preferably have a higher degree as well as experience in acoustic modeling for speech synthesis and/or recognition. Expertise in one or more of the following technical areas is also a distinct advantage:
- speech recognition with the HTK toolkit (http://htk.eng.cam.ac.uk)
- speech synthesis with the HTS HMM-based Speech Synthesis System (http://hts.sp.nitech.ac.jp)
- weighted finite state transducers for speech and language processing
The project focus is acoustic modeling but experience in statistical machine translation is also an advantage.

The cover sheet for applications, PD18 is available from http://www.admin.cam.ac.uk/offices/personnel/forms/pd18/ Part I and Part III only, should be sent, with a letter and CV to Dr Bill Byrne, Department of Engineering, Trumpington Street, Cambridge, CB2 1PZ, (Fax +44 01223 332662, email wjb31@cam.ac.uk).
Quote Reference: NA03547, Closing Date: 30 June 2008

The University values diversity and is committed to equality of opportunity.

Back to Top

6-16 . Head of NLP at Voxid UK

Head of NLP :

 

 We are now looking for a very experienced Computational Linguist

 to lead our efforts in the natural language processing area. This

 is a hugely challenging, but also a very rewarding role; the

 opportunities for applying linguistic techniques are virtually

 limitless and even small improvements in the algorithms for

 detecting and correcting potential conversion errors translate

 into serious cost savings for the company. This is a senior position

 leading a team and having the autonomy to build a strategic way

 forward for this department.

 

 Experience Needed:

 

 Grammars and parsing for spontaneous speech.

 Statistical methods.

 At least basic programming ability (shell scripts, Perl, awk).

 Spell-checkers, grammar checkers, auto correcting tools and predictive typing.

 Experience with Automatic Speech Recognition technology.

 Probabilistic Language Modelling

 Phonetics

 Multi-lingual

info@voxid.co.uk 

Back to Top

6-17 . (2008-07-01) Nuance: Junior Research Engineer for Embedded Automatic Speech Recognition

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, controlling their mobile phone or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about. To strengthen our international team we are currently looking for a

 

 

Junior Research Engineer for Embedded Automatic Speech Recognition

 

 

Work Environment

·         You will work in the Embedded ASR research and production team in Merelbeke, Belgium or Aachen, Germany, working with state-of -he-art speech technology, tools and runtime software. Both Gent and Aachen are nice, historical European university cities.

·         You will work in an international company and cooperate with people and research teams on various locations across the globe. You may occasionally be asked to travel.

·         You will work  with our natural language understanding and dialogue research teams as well support our professional services teams.

·         You will work on the development of cutting edge speech recognition products for automotive platforms and mobile devices. You will help the engine cope with multi-lingual speech in various noise conditions, and this while respecting strong limitations on the usage of memory and processing power.

 

Key Responsibilities

·         Design, implementation, evaluation, optimization and testing of new algorithms and tools, with a strong focus on speech signal processing and acoustic modeling in adverse, noisy environments.

·         Activities are targeted at the creation of commercial products for resource limited platforms.

·         Focus on creating efficient production and development processes to bring the technology to marketable products in a wide range of languages.

·         Occasional application of the developed algorithms and tools for producing systems for a specific language.

·         Specification and follow-up of projects to make the system work with third party components, such as beam formers, echo cancellers or content data providers.

 

Your Profile

  • You have a University degree in engineering, mathematics or physics.
  • A PhD degree in speech processing or equivalent relevant experience is a strong asset.
  • Experience in speech recognition research, especially acoustic modeling or signal processing, is required.
  • Experience in speech processing, machine learning techniques or statistical modeling is required.
  • Knowledge about small platforms and experience in developing software for them is a plus.
  • Strong software skills are required, especially C/C++ and a scripting language like Perl or Python in a Linux/Unix environment. Knowledge of Matlab is a plus.
  • Additional background in computational linguistics is a plus.
  • You are a team player, willing to take initiative, and are goal oriented.
  • You have a strong desire to make things “really work” in practice, on hardware platforms with limited memory and processing power.
  • You are fluent in English and at least one other language, and you can write high quality English documentation.  

 

Interested?

 

Please send your CV to Deanna Roe at deanna.roe@nuance.com. If you have any questions, please contact her at +44 207 922 5757.

 

We are looking forward to receiving your application!

 

Back to Top

6-18 . (2008-07-01) Nuance SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

In order to strengthen our Embedded ASR Research team, we are looking for a:

 

       SOFTWARE ENGINEER SPEECH DIALOGUE TOOLS

 

As part of our team, you will be creating solutions for voice user interfaces for embedded applications on mobile and automotive platforms.

 

 

OVERVIEW:

 

- You will work in Nuance's Embedded ASR (automatic speech recognition) research and development team, developing technology, tools, and run-time software to enable our customers to develop and test embedded speech applications. Together with our team of speech and language experts, you will work on natural language dialogue systems for our customers in the Automotive and Mobile sector.

- You will work on fascinating technology that has now reached the maturity to enable new generations of powerful and natural user interfaces. Your code is crucial to the research in speech and language technology that defines the state of the art in this field It is equally important for the products that you will find in the market, in speech-enabled cars, navigation devices, and cell phones.

- You will work in a large international software company that is the leading provider of speech and imaging solutions for businesses and consumers around the world. You will cooperate with people on various locations including in Europe, America and Asia. You may occasionally be asked to travel.

 

 

RESPONSIBILITIES:

 

- You will work on the development of tools and solutions for cutting edge speech and language understanding technologies for automotive and mobile devices.

- You will work on enhancing various aspects of our advanced natural language dialogue system, such as the layer of connected applications, the configuration setup, inter-module communication, etc.

- In particular, you will be responsible for the design, implementation, evaluation, optimization and testing, and documentation of tools such as GUI and XML applications that are used to develop, configure, and fine-tune advanced dialogue systems.

 

 

 

QUALIFICATIONS:

 

- You have a university degree in computer science, engineering, mathematics, physics, computational linguistics, or a related field.

- You have very strong software and programming skills, especially in C/C++, ideally also for embedded applications.

- You have experience with Python or other scripting languages.

- GUI programming experience is an asset.

 

The following skills are a plus:

- Understanding of communication protocols

- Understanding of databases

- A background in (computational) linguistics, dialogue systems, speech processing, grammars, and parsing techniques, statistics, pattern recognition, and machine learning, especially as related to natural language processing, dialogue, and representation of information

- Understanding of computational agents and related frameworks (such as OAA).

- You can work both as a team player and as goal-oriented independent software engineer.

- You can work in a multi-national team and communicate effectively with people of different cultures.

- You have a strong desire to make things really work in practice, on hardware platforms with limited memory and processing power.

- You are fluent in English and you can write high quality documentation.

- Knowledge of other languages is a plus.

 

 

 

CONTACT:

 

Please send your applications, including cover letter, CV, and related documents (maximum 5MB total for all documents, please) to

 

Benjamin Campued       Benjamin.Campued@nuance.com

 

Please make sure to document to us your excellent software engineering skills.

 

 

 

ABOUT US:

 

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched.  With more than 3500 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about.

 

Back to Top

6-19 . (2008-07-01) Nuance-Speech Scientist for Embedded Automatic Speech Recognition

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, controlling their mobile phone or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about. To strengthen our international team we are currently looking for a

 

 Speech Scientist for Embedded Automatic Speech Recognition

 

 

Work Environment

·          You will work in the Embedded ASR research and production team in Merelbeke, Belgium or Aachen, Germany, working with state-of-the-art speech technology, tools and runtime software. Both Gent and Aachen are nice, historical European university cities.

·          You will work in an international company and cooperate with people on various locations, from the USA up to Japan. You may occasionally be asked to travel.

·          You will work on the localization and production of language variants for our cutting edge speech recognition products targeted at automotive platforms and mobile devices. You will help the engine cope with multi-lingual speech in various noise conditions.

·          Initially, you will work on the production of language variants of our acoustic models, later extending your knowledge towards production of statistical language models and natural language dialogue systems.

 

Key Responsibilities

·          Training of  acoustic models or statistical language models for new languages.

·          Localizing natural language dialogue systems towards a specific market.

·          Contributing to the improvement, design, implementation, evaluation, optimization and testing of new algorithms, tools and processes.

·          Supporting our professional services teams to contribute to customer project success.

·          Assisting senior team members in research tasks.

 

Your Profile

  • You have a University degree in linguistics, engineering, mathematics or physics.
  • A PhD or similar experience in a relevant field is a plus.
  • Experience in acoustic modeling, NLU or statistical language modeling is recommended.
  • Additional background in computational linguistics is a plus.
  • Working in Windows and Linux environments comes naturally to you. Experience with computing farms and grid software is welcome.
  • You are knowledgable about small, embedded platforms and requirements of software applications designed for them.
  • Good software skills are required, especially scripting language like Perl or Python in a Linux/Unix environment, and knowledge of C/C++.
  • Experience in speech processing or machine learning techniques is an asset.
  • You are a team player, willing to take initiative, and are goal oriented.
  • You have a strong sense of precision and quality in your daily job.
  • You are fluent in English and you can write high quality documentation.  
  • You illustrate your interest in languages by speaking at least two other languages.

 

Interested?

 

Please send your CV to Deanna Roe at deanna.roe@nuance.com. If you have any questions, please contact her at +44 207 922 5757.

 

We are looking forward to receiving your application!

 

The experience speaks for itself™

 

Back to Top

6-20 . (2008-07-01) Nuance-Senior Research Engineer for Embedded Automatic Speech Recognition

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the world.  Every day, millions of users and thousands of businesses experience Nuance by calling directory assistance, requesting account information, dictating patient records, telling a navigation system their destination, controlling their mobile phone or digitally reproducing documents that can be shared and searched.  With more than 2000 employees worldwide, we are committed to make the user experience more enjoyable by transforming the way people interact with information and how they create, share and use documents. Making each of those experiences productive and compelling is what Nuance is about. To strengthen our international team we are currently looking for a

 

 

Senior Research Engineer for Embedded Automatic Speech Recognition

 

 

Work Environment

·          You will work in the Embedded ASR research and production team in Merelbeke, Belgium or Aachen, Germany, working with state-of -he-art speech technology, tools and runtime software. Both Gent and Aachen are nice, historical European university cities.

·          You will work in an international company and cooperate with people and research teams on various locations across the globe. You may occasionally be asked to travel.

·          You will work  with our natural language understanding and dialogue research teams as well support our professional services teams.

·          You will work on the development of cutting edge speech recognition products for automotive platforms and mobile devices. You will help the engine cope with multi-lingual speech in various noise conditions, and this while respecting strong limitations on the usage of memory and processing power.

 

Key Responsibilities

·          Design, implementation, evaluation, optimization and testing of new algorithms and tools, with a strong focus on speech signal processing and acoustic modeling in adverse, noisy environments.

·          Activities are targeted at the creation of commercial products for resource limited platforms.

·          Focus on creating efficient production and development processes to bring the technology to marketable products in a wide range of languages.

·          Occasional application of the developed algorithms and tools for producing systems for a specific language.

·          Specification and follow-up of projects to make the system work with third party components, such as beam formers, echo cancellers or content data providers.

 

Your Profile

  • You have a University degree in engineering, mathematics or physics.
  • A PhD degree in speech processing or equivalent relevant experience is a strong asset.
  • Experience in speech recognition research, especially acoustic modeling or signal processing, is required.
  • Experience in speech processing, machine learning techniques or statistical modeling is required.
  • Knowledge about small platforms and experience in developing software for them is a plus.
  • Strong software skills are required, especially C/C++ and a scripting language like Perl or Python in a Linux/Unix environment. Knowledge of Matlab is a plus.
  • Additional background in computational linguistics is a plus.
  • You are a team player, willing to take initiative, and are goal oriented.
  • You have a strong desire to make things “really work” in practice, on hardware platforms with limited memory and processing power.
  • You are fluent in English and at least one other language, and you can write high quality English documentation. 

 

Interested?

 

Please send your CV to Deanna Roe at deanna.roe@nuance.com. If you have any questions, please contact her at +44 207 922 5757.

 

We are looking forward to receiving your application!

 

The experience speaks for itself™

Back to Top

6-21 . (2008-07-01) Nuance- jr. Speech Scientist

Title: jr. Speech Scientist

 

Location: Aachen, Germany

 

Type: Permanent

 

Job:  

 

Overview:

 

Nuance is the leading provider of speech and imaging solutions for businesses and consumers around the World. Our technologies, applications and services make the user experience more compelling by transforming the way people interact with information and how they create, share and use documents. Every day, millions of users and thousands of businesses, experience Nuance by calling directory assistance, getting account information, dictating patient records, telling a navigation system their destination, or digitally reproducing documents that can be shared and searched. Making each of those experiences productive and compelling is what Nuance is all about.

 

Responsibilities:

 

Nuance is seeking a jr. Speech Scientist who possesses a solid background in natural language technology and computational linguistics.

Candidates should enjoy working in a fast-paced, collaborative atmosphere that applies speech science in a commercial, result driven and customer oriented setting.

 

As a jr. Speech Scientist in the Embedded Professional Services group, you will work on speech recognition grammars, statistical language models, prompts and custom voice development for leading edge automotive applications across the world, covering a broad range of activities in all project phases, including the design, development, and optimization of the system.

 

Representative duties include:

  • Develop rule based grammars, train statistical language models for speech recognition and natural language understanding in commercial products in a variety of languages, according to UI Design specifications
  • Identify or gather suitable text for training language models and custom voices
  • Design, develop, and test semantic classifier rules and models
  • Develop custom voices for use with Nuance’s leading text to speech products
  • Direct voice talents for prompt recordings
  • Organize and conduct usability tests
  • Localization of speech resources for embedded speech applications
  • Optimize accuracy of applications by analyzing performance and tuning statistical language models, grammars, and pronunciations within CPU and memory constraints of embedded platforms
  • Contribute to the generation and presentation of client-facing reports

 

Qualifications:

  • University degree in computational linguistics or Software design or similar degree
  • Strong analytical and problem solving skills and ability to troubleshoot issues
  • Good judgment and quick-thinking
  • Strong programming skills, preferably Perl or Python
  • Excellent written and verbal communications skills
  • Ability to scope work taking technical, business and time-frame constraints into consideration
  • Ability and willingness to travel abroad
  • Works well independently and collaboratively in team settings in fast-paced environment
  • Mastering Office applications

 

Beneficial Skills

  • Additional language skills, eg. French, German, Spanish or other
  • Strong programming skills in either Perl, Python, C, VB
  • Speech recognition knowledge
  • Pattern recognition, linguistics, signal processing, or acoustics knowledge
Back to Top

6-22 . (2008-07-02) Microsoft: Danish Linguist (M/F)

Opened positions/internships at Microsoft: Danish Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Danish language, for a 3-4 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Danish speaker

·         Have a university degree in Linguistics or related field (preferably in Danish Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start in September 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions. 

Back to Top

6-23 . (2008-07-02) Microsoft: Catalan Linguist (M/F)

Opened positions/internships at Microsoft: Catalan Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Catalan language, for a 3-4 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Catalan speaker

·         Have a university degree in Linguistics or related field (preferably in Catalan Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start in September 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions. 

Back to Top

6-24 . (2008-07-10) Bourse de these IRISA Lannion (in french)



titre du sujet : Synthèse vocale de haute qualité

Introduction
=========

Ces dernières années ont vu l'émergence de systèmes de synthèse de la parole construits autour de base de données de parole de taille importante qui correspondent le plus souvent à quelques heures d'enregistrement de parole. A des degrés divers, ces systèmes considèrent qu'il est possible de produire une parole de qualité en allant chercher des fragments de sons dans une base de données enregistrée au préalable par un locuteur. Ce type d'approche pousse à l'extrême l'hypothèse fonctionnelle des systèmes fondés sur la concaténation d'unités acoustiques. Avec une base de données suffisamment importante, il doit être possible de couvrir statistiquement les cas les plus fréquents de coarticulation sonore.

Des systèmes récents comme Festival (Black 1995), CHATR (Campbell 1996), Whistler (Huang 1996), XIMERA (Toda 2006), IBM \citethese{Eid06}, prouvent que cette approche méthodologique permet de construire des systèmes de synthèse de très bonne qualité.

En suivant cette méthodologie, les modèles ne sont plus utilisés pour produire des valeurs de paramètres qui serviront à la génération d'un signal de parole. Ils sont en revanche utilisés pour rechercher dans la base d'exemples sonores un extrait de parole qui sera le plus proche possible des paramètres modélisés et conformes à une élocution humaine. Concernant la problématique de recherche d'une séquence d'unités acoustiques, différentes solutions sont possibles. Les plus connues appliquent des solutions de recherche de meilleurs chemins (Sagisaka 1992) (Hunt 1996) en proposant une hypothèse de programmation dynmique. D'autres travaux (Donovan 1995) ont défini des modèles acoustiques permettant de guider le choix d'une séquence d'unités.

L'enjeu du procédé de sélection est double, (Iwahashi 1992). Il s'agit d'une part de trouver une correspondance entre une sous-séquence de la chaîne phonémique à synthétiser et un exemplaire plausible dans le corpus de référence. On parle alors d'une \emph{discrimination par critères de cible}, (Hune 1996). Une correspondance à la cible ne suffit pas puisque cette décision est prise unité par unité. Il faut un mécanisme supplémentaire garantissant que l'enchaînement du séquencement proposé réponde à des critères de continuité acoustique (de nature segmentale ou supra-segmentale). On parle dans ce cas de critères de concaténation. La difficulté du problème réside dans le fait que les deux critères sont combinés. Le choix d'une sous-séquence en correspondance avec une unité du corpus dépend de son contexte passé (contexte de la séquence à gauche) et à venir (contexte à droite). Il s'agit encore une fois d'un problème de nature combinatoire qui peut formellement être posé comme un problème de recherche d'un meilleur chemin dans un graphe.

La grande majorité des systèmes de synthèse appliquent un algorithme de Viterbi. Cet algorithme, efficace en complexité spatiale et temporelle, tire sa justification du fait que l'expression du coût global d'une séquence d'unités s'écrit, par hypothèse, sous la forme d'une suite récurrente additive. Cette justification est largement partagée par l'ensemble de la communauté pour ce qui est de l'expression des coûts de concaténation et des coûts de proximité à la cible. En revanche pour ce qui concerne la prise en compte de coûts de nature prosodique, une mise en forme recurrente est plus délicate et difficilement justifiable puisque ces phénomènes ont lieu à l'échelle du groupe intonatif et de la phrase.

Nous considérons qu'il est possible de dépasser la qualité des systèmes de synthèse actuels par la prise en compte de critère prosodiques lors de la recherche de la séquence optimale des unités. Tenir compte de ces critières proposidiques n'est pas une chose simple, puiqu'il faut définir de nouveaux modèles de description des coûts acoustiques et prosodiques d'une séquence. Ces nouvelles techniques de sélection devraient être acapables de proposer des voix avec d'une part plus de relief ou d'expressivité tout en maintenant une très bonne qualité sonore.

(Sagisake 1992) : Sagisaka, Y. and Kaiki, N. and Iwahashi, N. and Mimura, K., ATR mu-TALK speech synthesis system, proceedings of the International Conference on Spoken Language Processing (ICSLP'92)", 1992, pp. 483-486.
(Hunt 1996) :  Hunt, A. and Black, A.W., Unit selection in a concatenative speech synthesis system using a large speech database, proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'96), 1996, pp. 373-376.
(Donovan 1995) :  Donovan, R. and P. Woodland, P., Automatic speech synthesizer parameter estimation using HMMs, proceedings of the IEEE International Conference on Acoustics and Signal Processing (ICASSP'95), 1995, pp. 640-643.
(Iwahashi 1992) : Iwahashi, N. and Kaiki, N. and Sagisaka, Y., Concatenative speech synthesis by minimum distortion criteria, proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'92), 1992, pp. 65-68.
(Black 1995) :  Alan W. Black and Nick Campbell, Optimizing selection of units from speech databases for concatenative sysnthesis, proceedings of the IEEE International Conference on Acoustics and Signal Processing (ICASSP'95), vol. 1, pp. 581-584.
(Campbell 1996) :  Campbell, N. and Black, A., CHART: A High-definition speech re-sequencing system, in Progress in Speech Synthesis,
eds. van Santen, J. and Sproat, R. and Olive, J. and Hirschberg, J., 1996, pp. 365-381,
(Huang 1996) : Huang, X. and Acero, A. and Adcock, J. and Hon, H.-W. and Goldsmith, J. and Liu, J. and Plumpe, M., Whistler: A trainable text-to-speech system, proceedings of the International Conference on Spoken Language Processing (ICSLP'96), 1996, pp. 2397-2390.
(Toda 2006) :  Tomoki Toda and Hisashi Kawai and Toshio Hirai and Jinfu Ni and Nobuyuki Nishizawa and Junichi Yamagishi and Minoru Tsuzaki and Keiichi Tokuda and Satoshi Nakamura, Developing a test bed of English text-to-speech system XIMERA for the Blizzard challenge 2006,  Blizzard Challenge, 2006.
(Eide 2006) :  Ellen Eide and Raul Fernandez and Ron Hoory and Wael Hamza and Zvi Kons and Michael Picheny and Ariel Sagi and Slava Shechtman and Zhi Wei Shuang, The IBM submission to the 2006 Blizzard text-to-speech challenge, Blizzard Challenge, 2006.

Proposition d'un travail de thèse
======================

Nous proposons de nous intéresser à de nouvelles méthodologies de sélection d'unités acoustiques pour la synthèse de la parole à partir du texte. La proposition de thèse comporte deux volets: un axe de propositions scientifiques permettant de lever certains verrous notamment dans la formulation du coût d'une séquence d'unités, et un axe expérimental par la proposition d'une évolution du système de synthèse du groupe Cordial permettant de mettre en place des évaluations perceptuelles qui permettront de valider ou d'invalider les hypothèses de travail qui auront été choisies. Le travail expérimental sera réalisé sur le français. Nous souhaitons doubler les expérimentations sur l'anglais et participer ainsi au challenge Blizzard qui est une compétition internationale en synthèse de la parole.

Le travail de thèse prendra comme point d'appui la proposition suivante:
  * Mise en place et évaluation d'un premier système reposant sur l'état de l'art actuel en synthèse de la parole par corpus de parole continue. Prise en compte des niveaux acoustiques. Utilisation d'une base de parole expressive,  "chronic",  issue du projet ANR Vivos.
  * Proposition de modèles de sélection de nature prosodique.
  * Propositions algorithmiques, définition d'heuristiques pour une solution acceptable en temps de calcul.
  * Intégration des propositions prosodiques au système de synthèse de référence et évaluation.

Contexte du travail de thèse
===================

L'étudiant sera accueilli au sein de l'équipe Cordial de l'irisa : http://www.irisa.fr/cordial dont les
principaux travaux concernent le traitement de la parole : synthèse, transformation de parole, annotation de corpus.
L'équipe de recherche est hébergée dans les locaux de l'Ecole Nationale Supérieure des Sciences Appliquées et de Technologie,
http://www.enssat.fr, à Lannion. La thèse est financée sur trois ans par une bourse du conseil général des Côtes d'Armor.

__________________________________________________________________________________________________________

Olivier BOEFFARD
IRISA/ENSSAT - Université de Rennes 1
6 rue de Kerampont - BP 80518
F-22305 Lannion Cedex, France
Tel: +33 2 96 46 90 91
Fax: +33 2 96 37 01 99
e-mail: olivier.boeffard@irisa.fr, Olivier.Boeffard@univ-rennes1.fr
web: http://www.irisa.fr/cordial, http://www.enssat.fr
Back to Top

6-25 . (2008-07-24) Microsoft: Norwegian Linguist (M/F)

Opened positions/internships at Microsoft: Norwegian Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Norwegian language (Bokmal), for a 4-6 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Norwegian Bokmal speaker

·         Have a university degree in Linguistics or related field (preferably in Norwegian Linguistics)

·         Have an advanced level of English

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to start in October 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions.

Deadline for submissions: August 10, 2008 

Back to Top

6-26 . (2008-07-24) Microsoft: Finnish Linguist (M/F)

Opened positions/internships at Microsoft: Finnish Linguist (M/F)

MLDC – Microsoft Language Development Center, a branch of the Microsoft Product Group that develops Speech Recognition and Synthesis Technologies, situated in Porto Salvo, Portugal (http://www.microsoft.com/portugal/mldc), is seeking a full-time temporary language expert in the Finnish language, for a 6 month contract, to work in speech technology related development projects. The successful candidate should have the following requirements:

·         Be native or near native Finnish speaker

·         Have a university degree in Linguistics or related field (preferably in Norwegian Linguistics)

·         Have an advanced level of English (oral and written)

·         Have some experience in working with Speech Technology/Natural Language Processing/Linguistics, either in academia or in industry

·         Have some computational ability – no programming is required, but he/she should be comfortable working with MS Windows and MS Office tools

·         Have team work experience

·         Willing to work in Porto Salvo (near Lisbon) for the duration of the contract

·         Willing to work in a multicultural and multinational team across the globe

·         Willing to start in September 1, 2008

To apply, please submit your resume and a brief statement describing your experience and abilities to Daniela Braga: i-dbraga@microsoft.com

We will only consider electronic submissions.

Deadline for submissions: August 10, 2008

Back to Top

7 . Journals

Full text available on http://www.sciencedirect.com/ for Speech Communication subscribers and subscribing institutions. Free access for all to the titles and abstracts of all volumes and even by clicking on Articles in press and then Selected papers.

Back to Top

7-1 . IEEE Signal Processing Magazine: Special Issue on Digital Forensics

Guest Editors:
Edward Delp, Purdue University (ace@ecn.purdue.edu)
Nasir Memon, Polytechnic University (memon@poly.edu)
Min Wu, University of Maryland, (minwu@eng.umd.edu)

We find ourselves today in a "digital world" where most information
is created, captured, transmitted, stored, and processed in digital 
form. Although, representing information in digital form has many 
compelling technical and economic advantages, it has led to new 
issues and significant challenges when performing forensics analysis 
of digital evidence.  There has been a slowly growing body of 
scientific techniques for recovering evidence from digital data.  
These techniques have come to be loosely coupled under the umbrella 
of "Digital Forensics." Digital Forensics can be defined as "The 
collection of scientific techniques for the preservation, collection, 
validation, identification, analysis, interpretation, documentation 
and presentation of digital evidence derived from digital sources for 
the purpose of facilitating or furthering the reconstruction of 
events, usually of a criminal nature."

This call for papers invites tutorial articles covering all aspects 
of digital forensics with an emphasis on forensic methodologies and 
techniques that employ signal processing and information theoretic 
analysis. Thus, focused tutorial and survey contributions are 
solicited from topics, including but not limited to, the following:

 . Computer Forensics - File system and memory analysis. File carving.
 . Media source identification - camera, printer, scanner, microphone
identification.
 . Differentiating synthetic and sensor media, for example camera vs.
computer graphics images.
 . Detecting and localizing media tampering and processing.
 . Voiceprint analysis and speaker identification for forensics.
 . Speech transcription for forensics. Analysis of deceptive speech.
 . Acoustic processing for forensic analysis - e.g. acoustical gunshot
analysis, accident reconstruction.
 . Forensic musicology and copyright infringement detection.
 . Enhancement and recognition techniques from surveillance video/images.
Image matching techniques for auto-matic visual evidence
extraction/recognition.
 . Steganalysis - Detection of hidden data in images, audio, video. 
Steganalysis techniques for natural language steganography. Detection of covert
channels.
 . Data Mining techniques for large scale forensics.
 . Privacy and social issues related to forensics.
 . Anti-forensics. Robustness of media forensics methods against counter
measures.
 . Case studies and trend reports.

White paper submission: Prospective authors should submit white 
papers to the web based submission system at http://
www.ee.columbia.edu/spm/ according to the timetable. given below.  
White papers, limited to 3 single-column double-spaced pages, should 
summarize the motivation, the significance of the topic, a brief 
history, and an outline of the content.  In all cases, prospective 
contributors should make sure to emphasize the signal processing in 
their submission.

Schedule
 . White Paper Due: April 7, 2008
 . Notification of White paper Review Results: April 30, 2008
 . Full Paper Submission: July 15, 2008
 . Acceptance Notification: October 15, 2008
 . Final Manuscript Due: November 15, 2008
 . Publication Date: March 2009.


Back to Top

7-2 . Special Issue on Integration of Context and Content for Multimedia Management

IEEE Transactions on Multimedia            
 Special Issue on Integration of Context and Content for Multimedia Management
=====================================================================

Guest Editors:

Alan Hanjalic, Delft University of Technology, The Netherlands
Alejandro Jaimes, IDIAP Research Institute, Switzerland
Jiebo Luo, Kodak Research Laboratories, USA
        Qi Tian, University of Texas at San Antonio, USA

---------------------------------------------------
URL: http://www.cs.utsa.edu/~qitian/cfp-TMM-SI.htm
---------------------------------------------------
Important dates:

Manuscript Submission Deadline:       April 1, 2008
        Notification of Acceptance/Rejection: July 1, 2008
        Final Manuscript Due to IEEE:         September 1, 2008
        Expected Publication Date:            January 2009

---------------------
Submission Procedure
---------------------
Submissions should follow the guidelines set out by IEEE Transaction on Multimedia.
Prospective authors should submit high quality, original manuscripts that have not
appeared, nor are under consideration, in any other journals.

-------
Summary
-------
Lower cost hardware and growing communications infrastructure (e.g., Web, cell phones,
etc.) have led to an explosion in the availability of ubiquitous devices to produce,
store, view and exchange multimedia (images, videos, music, text). Almost everyone is
a producer and a consumer of multimedia in a world in which, for the first time,
tremendous amount of contextual information is being automatically recorded by the
various devices we use (e.g., cell ID for the mobile phone location, GPS integrated in
a digital camera, camera parameters, time information, and identity of the producer).

In recent years, researchers have started making progress in effectively integrating
context and content for multimedia mining and management. Integration of content and
context is crucial to human-human communication and human understanding of multimedia:
without context it is difficult for a human to recognize various objects, and we
become easily confused if the audio-visual signals we perceive are mismatched. For the
same reasons, integration of content and context is likely to enable  (semi)automatic
content analysis and indexing methods to become more powerful in managing multimedia
data. It can help narrow part of the semantic and sensory gap that is difficult or
even impossible to bridge using approaches that do not explicitly consider context for
(semi)automatic content-based analysis and indexing.

The goal of this special issue is to collect cutting-edge research work in integrating
content and context to make multimedia content management more effective. The special
issue will unravel the problems generally underlying these integration efforts,
elaborate on the true potential of contextual information to enrich the content
management tools and algorithms, discuss the dilemma of generic versus narrow-scope
solutions that may result from "too much" contextual information, and provide us
vision and insight from leading experts and practitioners on how to best approach the
integration of context and content. The special issue will also present the state of
the art in context and content-based models, algorithms, and applications for
multimedia management.

-----
Scope
-----

The scope of this special issue is to cover all aspects of context and content for
multimedia management.

Topics of interest include (but are not limited to):
        - Contextual metadata extraction
        - Models for temporal context, spatial context, imaging context (e.g., camera
          metadata), social and cultural context and so on
- Web context for online multimedia annotation, browsing, sharing and reuse
- Context tagging systems, e.g., geotagging, voice annotation
- Context-aware inference algorithms
        - Context-aware multi-modal fusion systems (text, document, image, video,
          metadata, etc.)
- Models for combining contextual and content information
        - Context-aware interfaces
- Context-aware collaboration
- Social networks in multimedia indexing
- Novel methods to support and enhance social interaction, including
          innovative ideas integrating context in social, affective computing, and
          experience capture.
- Applications in security, biometrics, medicine, education, personal
          media management, and the arts, among others
- Context-aware mobile media technology and applications
- Context for browsing and navigating large media collections
- Tools for culture-specific content creation, management, and analysis

------------
Organization
------------
Next to the standard open call for papers, we will also invite a limited number of
papers, which will be written by prominent authors and authorities in the field
covered by this Special Issue. While the papers collected through the open call are
expected to sample the research efforts currently invested within the community on
effectively combining contextual and content information for optimal analysis,
indexing and retrieval of multimedia data, the invited papers will be selected to
highlight the main problems and approaches generally underlying these efforts.

All papers will be reviewed by at least 3 independent reviewers. Invited papers will
be solicited first through white papers to ensure the quality and relevance to the
special issue. The accepted invited papers will be reviewed by the guest editors and
expect to account for about one fourth of the papers in the special issue.

---------
Contacts
---------
Please address all correspondences regarding this special issue to the Guest Editors
Dr. Alan Hanjalic (A.Hanjalic@ewi.tudelft.nl), Dr. Alejandro Jaimes
(alex.jaimes@idiap.ch), Dr. Jiebo Luo (jiebo.luo@kodak.com), and Dr. Qi Tian
(qitian@cs.utsa.edu).
-------------------------------------------------------------------------------------

Guest Editors:
Alan Hanjalic, Alejandro Jaimes, Jiebo Luo, and Qi Tian


Back to Top

7-3 . Speech Communication: Special Issue On Spoken Language Technology for Education

CALL FOR PAPERS
Special Issue of Speech Communication

on *Spoken Language Technology for Education*


*Guest-editors:*
Maxine Eskenazi, Associate Teaching Professor, Carnegie Mellon University
Abeer Alwan, Professor, University of California at Los Angeles
Helmer Strik, Assistant Professor, University of Nijmegen
 

Language technologies have evolved to the stage where they are reliable
enough, if their strong and weak points are properly dealt with, to be
used for education. The creation of an application for education
presents several challenges: making the language technology sufficiently
reliable (and thus advancing our knowledge in the language
technologies), creating an application that actually enables students to
learn, and engaging the student. Papers in this special issue should
deal with several of these issues. Although language learning is the
primary target of research at present, papers on the use of language
technologies for other education applications are encouraged. The scope
of acceptable topic interests includes but is not limited to:

 

- Use of speech technology for education

- Use of spoken language dialogue for education

- Applications using speech and natural language processing for education

- Intelligent tutoring systems using speech and natural language

- Pedagogical issues in using speech and natural language technologies
for education

- Assessment of tutoring software

- Assessment of student performance

 

*Tentative schedule for paper submissions, review, and revision**: ** *

Deadline for submissions: June 1, 2008.

Deadline for decisions and feedback from reviewers and editors: August
31, 2008.

Deadline for revisions of papers: November 31, 2008.

 

*Submission instructions:*

Authors should consult the "Guide for Authors", available online, at
http://www.elsevier.com/locate/specom for information about the
preparation of their manuscripts. Authors, please submit your paper via
_http://ees.elsevier.com/specom_, choosing *Spoken Language Tech. *as
the Article Type, and  Dr. Gauvain as the handling E-i-C.

Back to Top

7-4 . Special Issue on Processing Morphologically Rich Languages IEEE Trans ASL

Call for Papers for a Special Issue on
                Processing Morphologically Rich Languages 
          IEEE Transactions on Audio, Speech and Language Processing
 
This is a call for papers for a special issue on Processing Morphologically
Rich Languages, to be published in early 2009 in the IEEE Transactions on 
Audio, Speech and Language Processing. 
 
Morphologically-rich languages like Arabic, Turkish, Finnish, Korean, etc.,
present significant challenges for speech processing, natural language 
processing (NLP), as well as speech and text translation. These languages are 
characterized by highly productive morphological processes (inflection, 
agglutination, compounding) that may produce a very large number of word 
forms for a given root form.  Modeling each form as a separate word leads 
to a number of problems for speech and NLP applications, including: 1) large 
vocabulary growth, 2) poor language model (LM) probability estimation, 
3) higher out-of-vocabulary (OOV) rate, 4) inflection gap for machine 
translation:  multiple different forms of  the same underlying baseform 
are often treated as unrelated items, with negative effects on word alignment 
and translation accuracy.  
 
Large-scale speech and language processing systems require advanced modeling 
techniques to address these problems. Morphology also plays an important 
role in addressing specific issues of “under-studied” languages such as data 
sparsity, coverage and robust modeling. We invite papers describing 
previously unpublished work in the following broad areas: Using morphology for speech recognition and understanding, speech and text translation, 
speech synthesis, information extraction and retrieval, as well as 
summarization . Specific topics of interest include:
- methods addressing data sparseness issue for morphologically rich 
  languages with application to speech recognition, text and speech 
  translation, information extraction and retrieval, speech   
  synthesis, and summarization
- automatic decomposition of complex word forms into smaller units 
- methods for optimizing the selection of units at different levels of 
  processing
- pronunciation modeling for morphologically-rich languages
- language modeling for morphologically-rich languages
- morphologically-rich languages in speech synthesis
- novel probability estimation techniques that avoid data sparseness 
  problems
- creating data resources and annotation tools for morphologically-rich 
  languages
 
Submission procedure:  Prospective authors should prepare manuscripts 
according to the information available at 
http://www.signalprocessingsociety.org/periodicals/journals/taslp-author-in=ormation/. 
Note that all rules will apply with regard to submission lengths, 
mandatory overlength page charges, and color charges. Manuscripts should 
be submitted electronically through the online IEEE manuscript submission 
system at http://sps-ieee.manuscriptcentral.com/. When selecting a 
manuscript type, authors must click on "Special Issue of TASLP on 
Processing Morphologically Rich Languages". 
 
Important Dates:
Submission deadline:  August 1, 2008               
Notification of acceptance: December 31, 2008
Final manuscript due:  January 15, 2009    
Tentative publication date: March 2009
 
Editors
Ruhi Sarikaya (IBM T.J. Watson Research Center) sarikaya@us.ibm.com
Katrin Kirchhoff (University of Washington) katrin@ee.washington.edu
Tanja Schultz (University of Karlsruhe) tanja@ira.uka.de
Dilek Hakkani-Tur (ICSI) dilek@icsi.berkeley.ed
Back to Top

7-5 . CfP-Special Issue Analysis and Signal Processing of of Oesophageal and Pathological Voices

Special Issue of EURASIP Journal on Advances in Signal Processing
on Analysis and Signal Processing of Oesophageal and Pathological Voices
 
 
Call for Papers
Speech is the most important means of communication
among humans. Speech, however, is not limited only to the
process of communication, but is also very important for
transferring emotions, expressing our personality, and reflecting
situations of stress. Modern lifestyles have increased
the risk of experiencing some kind of voice alteration. It is
estimated that around 19% of the population suffer or have
suffered from dysphonic voicing. Thismotivates new and objective
ways to evaluate speech, its quality, and its connection
with other phenomena.
Speech research to date has favored areas such as synthesis,
recognition, and speaker verification. The last few years have
witnessed the emerging topic of processing and evaluation
of disordered speech. Acoustic analysis is a noninvasive technique
providing an efficient tool for the objective diagnosis,
the screening of voice diseases, the objective determination
of vocal function alterations, and the evaluation of surgical
treatment and rehabilitation. Its application extends beyond
medicine, and now includes forensic analysis as well as voice
quality control for voice professionals. Acoustic analysis may
also be seen as complementary to other methods of evaluation
based on the direct observation of the vocal folds using
videoendoscopy.
This special issue aims to foster an interdisciplinary forumfor
presenting new work in the analysis andmodeling of
voice signals and videoendoscopic images, with applications
in pathological and oesophageal voices.
Topics of interest include (but are not limited to):
• Automatic detection of voice disorders
• Automatic assessment and classification of voice quality
• New strategies for the parameterization and modeling
of normal and pathological voices (biomechanicalbased
parameters, chaos modeling, etc.)
• Databases of vocal disorders
• Inverse filtering
• Signal processing for remote diagnosis
• Speech enhancement for pathological and oesophageal
voices
• Objective parameters extraction from vocal fold
images using videolaryngoscopy, videokymography,
fMRI, and other emerging techniques
• Multimodal analysis of disordered speech
• Robust pitch extraction algorithms for pathological
and oesophageal voices Robust pitch extraction algorithms
for pathological and oesophageal voices
Since speech communication is fundamental to human interaction,
we are moving towards a new scenario where speech
is gaining greater importance in our daily lives, and many
common speech disorders and dysfunctions would be treated
using computer-based or physical prosthetics.
Authors should follow the EURASIP Journal on Advances
in Signal Processing manuscript format described
at http://www.hindawi.com/journals/asp/. Prospective authors
should submit an electronic copy of their complete
manuscript through the journalManuscript Tracking System
at http://mts.hindawi.com/ according to the following tentative
timetable:
Manuscript Due November 1, 2008
First Round of Reviews February 1, 2009
Publication Date May 1, 2009
Guest Editors
Juan I. Godino-Llorente, Department of Circuits and
Systems Engineering, Polytechnic University of Madrid
(UPM), Ctra. de Valencia, 28031 Madrid, Spain;
igodino@ics.upm.es
Pedro Gómez-Vilda, Department of Computer Science
and Engineering, Polytechnic University of Madrid (UPM),
Boadilla del Monte, 28660 Madrid, Spain;
pedro@pino.datsi.fi.upm.es
Tan Lee, Department of Electronic Engineering, The
Chinese University of Hong Kong, Shatin, New Territories,
Hong Kong; tanlee@ee.cuhk.edu.hk
Hindawi Publishing Corporation
http://www.hindawi.com
Back to Top

7-6 . Cfp Issue of Speech Communication on ‘‘Silent Speech’’ Interfaces

Special Issue on ‘‘Silent Speech’’ Interfaces
Guest Editors
Professor Bruce Denby (denby@ieee.org)
Prof. Dr. Ing. Tanja Schultz (tanja@ira.uka.de)
Dr. Kiyoshi Honda, MD, DMsc. (honda@atr.jp)
A ‘‘Silent Speech’’ Interface (SSI) allows to process a speech signal which a user outputs without actually vocalizing any
sound. Based on sensors of different types, such systems would provide normal speech to those for whom vocalization is
difficult or impossible due to age or illness, and as such are a complement to surgical solutions, vocal prostheses, and touchscreen
synthesis systems. More recently, the advent of the cellular telephone has created interest in SSIs from quite a
different perspective. The electronic representation of the speech signal created by an SSI can be injected directly into a
digital transmission system, leaving synthesis to be carried out only at the distant user’s handset. This opens the way to
telecommunications systems operating in total silence, thus assuring the privacy and security of users’ communications,
while at the same time protecting the acoustic environment of those not participating in the exchange. As a further benefit,
since SSIs do not use standard acoustic capture techniques, they will also be very interesting in terms of speech processing
in noisy environments. Quite naturally, the ‘‘silent communication’’ and high-noise environment capabilities of SSIs have
attracted the interest of the defense and security communities, as well.
Prototype SSI systems have already appeared in the research literature, including: imaging-based solutions such as
ultrasound and standard video capture; inertial approaches which translate articulator movement directly into electrical
signals, for example electromagnetic articulography; electromyographic techniques, which capture the minute electrical
signals associated with articulator movement; systems exploiting non-audible acoustic signals produced by articulator
movement, such as ‘‘non-acoustic murmur’’ microphones; all the way to ‘‘brain computer interfaces’’ in which neural
speech command signals are captured before they reach the articulators, thus obviating the need for movement of any kind
on the part of the speaker.
The goal of the special issue on ‘‘Silent Speech’’ Interfaces is to provide to the speech community an introduction to this
exciting, emergent field. Contributions should therefore cover as broad an area as possible, but at the same time, be of
sufficient depth to encourage the critical evaluations and reflections that will lead to further advances in the field, and
hopefully to new collaborations. To obtain the necessary quality, breadth, and balance, a limited number of invited articles
will be complemented by a call for submission of 1-page paper proposals. The final issue will be compiled from the invited
contributions and the follow-up full articles from accepted 1-page proposals. There will also be a comprehensive review
article, to which some article authors may be asked to contribute. All papers, both invited and submitted, will undergo the
usual Speech Communication peer review process.
Proposals for contributions (1-page only, in .pdf format), outlining the originality of the approach, current status of
the research work, as well as benefits and potential drawbacks of the method, should be sent to denby@ieee.org by
9 September 2008. A list of important dates is given below.
Important dates
Invited articles: Invitations are sent concurrently with the Call for Papers.
Deadline for submission of 1-page proposals: 9 September 2008 (submit .pdf directly to denby@ieee.org).
Notification of acceptance for 1-page proposals: 30 September 2008.
Deadline of submission for full papers, both proposed and invited: 30 November 2008. All authors are asked to prepare their
full papers according to the guidelines set in the Guide for Authors, located at http://www.elsevier.com/locate/specom, and
to submit their papers to the online submission and reviewing tool, at http://ees.elsevier.com/specom. They should select
Special Issue: ‘‘Silent Speech’’ Interfaces, as the article type, and Professor Kuldip Paliwal as the handling Editor-in-Chief.
Journal publication: Second quarter 2009.
Back to Top

7-7 . CfP Special issue of EURASIP Journal of Advances in Signal Processing on Biometrics

Call for Papers

Recent Advances in Biometric Systems: A Signal Processing Perspective



Biometrics a digital recognition technology that relies on highly distinctive physical and physiological characteristics of an individual is potentially a powerful and reliable method for personal authentication. The increasing importance of biometrics is underscored by the rapidly growing number of educational and research activities devoted to this field; and by a large number of annually organized Conferences and Symposia exclusively devoted to biometrics. Biometrics is a multidisciplinary field with researchers from signal processing, pattern recognition, computer vision, and statistics. Recently, a number of new important directions have been identified for biometric research, including processing and encoding of nonideal data, biometrics at a distance, and data quality assessment. Problems in nonideal biometric data include off-angle, occluded, blurred, and noisy images. Biometrics at a distance is concerned with recognition from video or snapshots of a biometric samples captured from a noncooperative moving individual. The goal of this special issue is to focus on recent advances in signal processing of biometric data that allow improved recognition performance through novel restoration, processing, and encoding; matching techniques capable of dealing with complexity and distortions in data acquired from a distance; recognition from biometric data acquired from unconstrained environments or complex experimental set ups; and the characterization of quality and its relationship with performance.

Topics of interest include, but are not limited to:

Biometric-based recognition under unconstrained presentation and/or complex environment using the following:
    o Face
    o Iris
    o Fingerprint
    o Voice
    o Hand
    o Soft biometrics

Multimodal biometric recognition using nonideal data

Biometric image/signal quality assessment:
    o Face
    o Iris
    o Fingerprint
    o Voice
    o Hand
    o Soft biometrics

Biometric security and privacy
    o Liveness detection
    o Encryption
    o Cancelable biometrics

The special issue will focus both on the development and comparison of novel signal/image processing approaches and on their expanding range of applications.

Authors should follow the EURASIP Journal on Advances in Signal Processing manuscript format described at the journal site http://www.hindawi.com/journals/asp/. Prospective authors should submit an electronic copy of their complete manuscript through the journal Manuscript Tracking System at http://mts.hindawi.com/, according to the following timetable:

Manuscript Due                 October 1, 2008
First Round of Reviews         January 1, 2009
Publication Date               April 1, 2009

Guest Editors

o Natalia A. Schmid, Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506, USA; natalia.schmid@mail.wvu.edu
o Stephanie Schuckers, Electrical &amp; Computer Engineering, Clarkson University, Potsdam, NY 13699, USA; sschucke@clarkson.edu
o Jonathon Phillips, National Institute of Standard and Technology, Gaithersburg, MD 20899, USA; jonathon@nist.gov
o Kevin Bowyer, University of Notre Dame, Notre Dame, IN 46556, USA; kwb@cse.nd.edu 

Back to Top

8 . Forthcoming events supported (but not organized) by ISCA

8-1 . (2008-09-22) LIPS 2008 Visual Speech Synthesis Challenge

LIPS 2008 is the first visual speech synthesis challenge. It will be

held as a special session at INTERSPEECH 2008 in Brisbane, Australia

(http://www.interspeech2008.org). The aim of this challenge is to

stimulate discussion about subjective quality assessment of synthesised

visual speech with a view to developing standardised evaluation procedures.

In association with this challenge a training corpus of audiovisual

speech and accompanying phoneme labels and timings will be provided to

all entrants, who should then train their systems using this data. (As

this is the first year the challenge will run and to promote wider

participation, proposed entrants are free to use a pre-trained model).

Prior to the session a set of test sentences (provided as audio, video

and phonetic labels) must be synthesised on-site in a supervised room. A

series of double-blind subjective tests will then be conducted to

compare each competing system against all others. The overall winner

will be announced and presented with their prize at the closing ceremony

of the conference.

All entrants will submit a 4/6 (TBC) page paper describing their system

to INTERSPEECH indicating that the paper is addressed to the LIPS special

session. A special edition of the Eurasip Journal on Speech, Audio and Music

Processing in conjunction with the challenge is also scheduled.

To receive updated information as it becomes available, you can join the

mailing list by visiting

https://mail.icp.inpg.fr/mailman/listinfo/lips_challenge. Further

details will be mailed to you in due course.

Please invite colleagues to join and dispatch this email largely to your

academic and industrial partners. Besides a large participation of

research groups in audiovisual speech synthesis and talking faces we

particularly welcome participation of the computer game industry.

Please confirm your willingness to participate in the challenge, submit

a paper describing your work and join us in Brisbane by sending an email

to sascha.fagel@tu-berlin.de, b.theobald@uea.ac.uk,

gerard.bailly@gipsa-lab.inpg.fr

 

Organising Committee

Sascha Fagel, University of Technology, Berlin - Germany

Barry-John Theobald, University of East Anglia, Norwich - UK

Gerard Bailly, GIPSA-Lab, Dpt. Speech & Cognition, Grenoble - France

 

Back to Top

8-2 . (2008-09-22) Human-Machine Comparisons of consonant recognition in quiet and noise

Consonant Challenge:
  Human-machine comparisons of consonant recognition in quiet and noise

                   Interspeech, 22-26 September 2008
                         Brisbane, Australia

* Update:
All information concerning the native listener experiments and baseline 
recognisers
including their results can now be found and downloaded from the Consonant 
Challenge website:
http://www.odettes.dds.nl/challenge_IS08/

* Deadline for submissions:
The deadline and paper submission guidelines for full paper submission (4 
pages) is April
7th, 2008. Paper submission is done exclusively via the Interspeech 2008 
conference website.
Participants of this Challenge are asked to indicate the correct Special 
Session during
submission. More information on the Interspeech conference can be found 
here: http://
www.interspeech2008.org/

* Topic of the Consonant Challenge:
Listeners outperform automatic speech recognition systems at every level 
of speech
recognition, including the very basic level of consonant recognition. What 
is not clear is
where the human advantage originates. Does the fault lie in the acoustic 
representations of
speech or in the recogniser architecture, or in a lack of compatibility 
between the two?
There have been relatively few studies comparing human and automatic 
speech recognition on
the same task, and, of these, overall identification performance is the 
dominant metric.
However, there are many insights which might be gained by carrying out a 
far more detailed
comparison.

The purpose of this Special Session is to make focused human-computer 
comparisons on a task
involving consonant identification in noise, with all participants using 
the same training
and test data. Training and test data and native listener and baseline 
recogniser results
will be provided by the organisers, but participants are encouraged to 
also contribute
listener responses.

* Call for papers:
Contributions are sought in (but not limited to) the following areas:

- Psychological models of human consonant recognition
- Comparisons of front-end ASR representations
- Comparisons of back-end recognisers
- Exemplar vs statistical recognition strategies
- Native/Non-native listener/model comparisons

* Organisers:
Odette Scharenborg (Radboud University Nijmegen, The Netherlands -- 
O.Scharenborg@let.ru.nl)
Martin Cooke (University of Sheffield, UK -- M.Cooke@dcs.shef.ac.uk)

Back to Top

9 . Future Speech Science and Technology Events

9-1 . Call for Workshop proposals EACL 2009, NAACL HLT 2009, ACL-UCNLP 2009

CALL FOR WORKSHOP PROPOSALS EACL 2009, NAACL HLT 2009, AND ACL-IJCNLP 2009

Joint site:  http://www.eacl2009.gr/conference/callforworkshops
The Association for Computational Linguistics invites proposals for
workshops to be held in conjunction with one of the three flagship
conferences sponsored in 2009 by the Association for Computational
Linguistics: ACL-IJCNLP 2009, EACL 2009, and NAACL HLT 2009.  We solicit
proposals on any topic of interest to the ACL community. Workshops will
be held at one of the following conference venues:

EACL 2009 is the annual meeting of the European chapter of the ACL. The
conference will be held in Athens, Greece, March 30-April 3 2009;
workshops March 30-31.

NAACL HLT 2009 is the annual meeting of the North American chapter of
the ACL.  It continues the inclusive tradition of encompassing relevant
work from the natural language processing, speech and information
retrieval communities.  The conference will be held in Boulder,
Colorado, USA, from May 31-June 5 2009; workshops will be held June 4-5.

ACL-IJCNLP 2009 combines the 47th Annual Meeting of the Association for
Computational Linguistics (ACL 2009) with the 4th International Joint
Conference on Natural Language Processing (IJCNLP).  The conference will
be held in Singapore, August 2-7 2009; workshops will be held August 6-7.


    SUBMISSION INFORMATION

In a departure from previous years, ACL-IJCNLP, EACL and NAACL HLT will
coordinate the submission and reviewing of workshop proposals for all
three ACL 2009 conferences.

Proposals for workshops should contain:

    * A title and brief (2-page max) description of the workshop topic
      and content.
    * The desired workshop length (one or two days), and an estimate
      of the audience size.
    * The names, postal addresses, phone numbers, and email addresses
      of the organizers, with one-paragraph statements of their
      research interests and areas of expertise.
    * A budget.
    * A list of potential members of the program committee, with an
      indication of which members have already agreed.
    * A description of any shared tasks associated with the workshop.
    * A description of special requirements for technical needs.
    * A venue preference specification.

The venue preference specification should list the venues at which the
organizers would be willing to present the workshop (EACL, NAACL HLT, or
ACL-IJCNLP).  A proposal may specify one, two, or three acceptable
workshop venues; if more than one venue is acceptable, the venues should
be preference-ordered.  There will be a single workshop committee,
coordinated by the three sets of workshop chairs.  This single committee
will review the quality of the workshop proposals.  Once the reviews are
complete, the workshop chairs will work together to assign workshops to
each of the three conferences, taking into account the location
preferences given by the proposers.

The ACL has a set of policies on workshops. You can find general
information on policies regarding attendance, publication, financing,
and sponsorship, as well as on financial support of SIG workshops, at
the following URL:
http://www.cis.udel.edu/~carberry/ACL/index-policies.html

Please submit proposals by electronic mail no later than September 1
2008, to acl09-workshops at acl09-workshops@uni-konstanz.de with the
subject line: "ACL 2009 WORKSHOP PROPOSAL."


    PRACTICAL ARRANGEMENTS

Notification of acceptance of workshop proposals will occur no later
than September 23, 2008.  Since the three ACL conferences will occur at
different times, the timescales for the submission and reviewing of
workshop papers, and the preparation of camera-ready copies, will be
different for the three conferences. Suggested timescales for each of
the conferences are given below.

ALL CONFERENCES
Sep 1, 2008     Workshop proposal deadline
Sep 23, 2008    Notification of acceptance of workshops

EACL 2009
Sep 30, 2008    Call for papers issued by this date
Dec 12, 2008    Deadline for paper submission
Jan 23, 2009    Notification of acceptance of papers
Feb  6, 2009    Camera-ready copies due
Mar 30-31, 2009 EACL 2009 workshops

NAACL HLT 2009
Dec 10, 2008    Call for papers issued by this date
Mar 6, 2009     Deadline for paper submissions
Mar 30, 2009    Notification of paper acceptances
Apr 12, 2009    Camera-ready copies due
June 4-5, 2009  NAACL HLT 2009 workshops

ACL-IJCNLP 2009
Feb 6, 2009     Call for papers issued issued by this date
May 1, 2009     Deadline for paper submissions
Jun 1, 2009     Notification of acceptances
Jun 14, 2009    Camera-ready copies due
Aug 6-7, 2009   ACL-IJCNLP 2009 Workshops

Workshop Co-Chairs:

    * Miriam Butt, EACL, University of Konstanz
    * Stephen Clark, EACL, Oxford University
    * Nizar Habash, NAACL HLT, Columbia University
    * Mark Hasegawa-Johnson, NAACL HLT, University of Illinois at
Urbana-Champaign
    * Jimmy Lin, ACL-IJCNLP, University of Maryland
    * Yuji Matumoto, ACL-IJCNLP, Nara Institute of Science and Technology

For inquiries, send email to: acl09-workshops at
acl09-workshops@uni-konstanz.de

Back to Top

9-2 . (2008-08-25) EUSIPCO-2008 - 16th European Signal Processing Conference - Lausanne Switzerland

EUSIPCO-2008 - 16th European Signal Processing Conference - August 25-29, 2008, Lausanne, Switzerland

- http://www.eusipco2008.org/

 

DEADLINE FOR SUBMISSION: February 8, 2008

 

The 2008 European Signal Processing Conference (EUSIPCO-2008) is the sixteenth in a series of conferences promoted by EURASIP, the European Association for Signal Processing (www.eurasip.org). This edition will take place in Lausanne, Switzerland, organized by the Swiss Federal Institute of Technology, Lausanne (EPFL).

 

EUSIPCO-2008 will focus on the key aspects of signal processing theory and applications. Exploration of new avenues and methodologies of signal processing will also be encouraged. Accepted papers will be published in the Proceedings of EUSIPCO-2008. Acceptance will be based on quality, relevance and originality. Proposals for tutorials are also invited.

 

*** This year will feature some exciting events and novelties: ***

 

- We are preparing a very attractive tutorial program and for the first time, access to the tutorials will be free to all registered participants! Some famous speakers have already been confirmed, but we also hereby call for new proposals for tutorials.

- We will also have top plenary speakers, including Stéphane Mallat (Polytechnique, France), Jeffrey A. Fessler (The University of Michigan, Ann Arbor, Michigan, USA), Phil Woodland (Cambridge, UK) and Bernhard Schölkopf (Max Planck Institute, Tübingen, Germany).

- The Conference will include 12 very interesting special sessions on some of the hottest topics in signal processing. See http://www.eusipco2008.org/11.html for the complete list of those special sessions.

- The list of 22 area chairs has been confirmed: see details at http://www.eusipco2008.org/7.html

- The social program will also be very exciting, with a welcome reception at the fantastic Olympic Museum in Lausanne, facing the Lake Geneva and the Alps (http://www.olympic.org/uk/passion/museum/index_uk.asp) and with the conference banquet starting with a cruise on the Lake Geneva on an historical boat, followed by a dinner at the Casino of Montreux (http://www.casinodemontreux.ch/).

Therefore I invite you to submit your work to EUSIPCO-2008 by the deadline and to attend the Conference in August in Lausanne.


 

 

IMPORTANT DATES:

Submission deadline of full papers (5 pages A4): February 8, 2008

Submission deadline of proposals for tutorials: February 8, 2008

Notification of Acceptance: April 30, 2008

Conference: August 25-29, 2008

 

 

More details on how to submit papers and proposals for tutorials can be found on the conference web site http://www.eusipco2008.org/

Back to Top

9-3 . (2008-09-02) Third Workshop on Speech in Mobile and Pervasive Environments

Call for papers
      Third Workshop on Speech in Mobile and Pervasive Environments
                  (in conjunction with ACM MobileHCI '08)
                        Amsterdam, The Netherlands
                             September 2, 2008
                      http://research.ihost.com/SiMPE



In the past, voice-based applications have been accessed using
unintelligent telephone devices through Voice Browsers that reside on the
server. The proliferation of pervasive devices and the increase in their
processing capabilities, client-side speech processing has been emerging as
a viable alternative. In SiMPE 2008, the third in the series, we will
continue to explore the various possibilities and issues that arise while
enabling speech processing on resource-constrained, possibly mobile
devices.



Topics of Interest:

All areas that enable, optimise or enhance Speech in mobile and pervasive
environments and devices. Possible areas include, but are not restricted
to:
      * Robust Speech Recognition in Noisy and Resource-constrained
Environments
      * Memory/Energy Efficient Algorithms
      * Multimodal User Interfaces for Mobile Devices
      * Protocols and Standards for Speech Applications
      * Distributed Speech Processing
      * Mobile Application Adaptation and Learning
      * Prototypical System Architectures
      * User Modelling
      * HCI issues in SiMPE applications
      * Design and cultural issues in SiMPE
      * Speech interfaces/applications for Developing Regions



Submissions:

We seek original, unpublished papers in the following three categories: (a)
Position papers that describe novel ideas that can lead to interesting
research directions, (b) Early results or work-in-progress that has
significant promise, or, (c) Full papers. Papers should be of 4-8 pages in
length in the MobileHCI publication format. The LaTeX and Microsoft Word
templates are available at the workshop website. All submissions should be
in the PDF format and should be submitted electronically through the
workshop submission web site,
http://www.easychair.org/conferences/?conf=simpe08. Since the submission
deadlines are dependent on the MobileHCI conference, we will not be able to
grant any extensions in any circumstances.

For any comments regarding submissions and participation, contact:
simpe08@easychair.org



Key Dates:

 * Paper Submission Deadline: May 05, 2008 (11:59 PM CET)
 * Notification of Acceptance: May 19, 2008
 * Early Registration Deadline: June 02, 2008
 * Workshop: September 02, 2008.



Organising Committee:

Amit A. Nanavati, IBM India Research Laboratory.
Nitendra Rajput, IBM India Research Laboratory.
Alexander I. Rudnicky, Carnegie Mellon University.
Markku Turunen, University of Tampere, Finland.



Programme Committee:

Lou Boves, University of Nijmegen, The Netherlands
Matt Jones, Swansea University, UK
Yoon Kim, Novauris Technologies, USA
Lars Bo Larsen, Aalborg University, Denmark
Gary Marsden, University of Cape Town, South Africa
Michael McTear, University of Ulster, Ireland
Shrikanth S. Narayanan, University of Southern California, USA
Tim Paek, Microsoft, USA
David Pearce, Motorola, UK.
Mike Phillips, Vlingo, USA
Markku Turunen, University of Tampere, Finland
Yaxin Zhang, Motorola, China
(More to be updated)



Websites:

 * SiMPE Workshop: http://research.ihost.com/SiMPE
 * ACM MobileHCI '08: http://www.mobilehci.org/
 * SiMPE 2007: http://research.ihost.com/SiMPE/2007
 * SiMPE 2006: http://research.ihost.com/SiMPE/2006

Back to Top

9-4 . (2008-09-08) 5th Joint Workshop on Machine Learning and Multimodal Interaction MLMI 2008

8-10 September 2008
                     Utrecht, The Netherlands

                       http://www.mlmi.info/


The fifth MLMI workshop will be held in Utrecht, The Netherlands,
following successful workshops in Martigny (2004), Edinburgh (2005),
Washington (2006) and Brno (2007).  MLMI brings together researchers
from the different communities working on the common theme of advanced
machine learning algorithms applied to multimodal human-human and
human-computer interaction.  The motivation for creating this joint
multi-disciplinary workshop arose from the actual needs of several large
collaborative projects, in Europe and the United States.


* Important dates

Submission of papers/posters: Monday, 31 March 2008
Acceptance notifications: Monday, 12 May 2008
Camera-ready versions of papers: Monday, 16 June 2008
Workshop: 8-10 September 2008


* Workshop topics

MLMI 2008 will feature talks (including a number of invited speakers),
posters and demonstrations.  Prospective authors are invited to submit
proposals in the following areas of interest, related to machine
learning and multimodal interaction:
 - human-human communication modeling
 - audio-visual perception of humans
 - human-computer interaction modeling
 - speech processing
 - image and video processing
 - multimodal processing, fusion and fission
 - multimodal discourse and dialogue modeling
 - multimodal indexing, structuring and summarization
 - annotation and browsing of multimodal data
 - machine learning algorithms and their applications to the topics above


* Satellite events

MLMI'08 will feature special sessions and satellite events, as during
the previous editions of MLMI (see http://www.mlmi.info/ for examples).  To
propose special sessions or satellite events, please contact the special
session chair.

MLMI 2008 is broadly colocated with a number of events in related
domains: Mobile HCI 2008, 2-5 September, in Amsterdam; FG 2008, 17-19
September, in Amsterdam; and ECML 2008, 15-19 September, in Antwerp.


* Guidelines for submission

The workshop proceedings will be published in Springer's Lecture Notes
in Computer Science series (pending approval).  The first four editions
of MLMI were published as LNCS 3361, 3869, 4299, and 4892.  However,
unlike previous MLMIs, the proceedings of MLMI 2008 will be printed
before the workshop and will be already available onsite to MLMI 2008
participants.

Submissions are invited either as long papers (12 pages) or as short
papers (6 pages), and may include a demonstration proposal.  Upon
acceptance of a paper, the Program Committee will also assign to it a
presentation format, oral or poster, taking into account: (a) the most
suitable format given the content of the paper; (b) the length of the
paper (long papers are more likely to be presented orally); (c) the
preferences expressed by the authors.

Please submit PDF files using the submission website at
http://groups.inf.ed.ac.uk/mlmi08/, following the Springer LNCS format
for proceedings and other multiauthor volumes
(http://www.springer.com/east/home/computer/lncs?SGWID=5-164-7-72376-0).
 Camera-ready versions of accepted papers, both long and short, are
required to follow these guidelines and to take into account the
reviewers' comments.  Authors of accepted short papers are encouraged to
turn them into long papers for the proceedings.


* Venue

Utrecht is the fourth largest city in the Netherlands, with historic
roots back to the Roman Empire.  Utrecht hosts one of the bigger
universities in the country, and with its historic centre and the many
students it provides and excellent atmosphere for social activities in-
or outside the workshop community.  Utrecht is centrally located in the
Netherlands, and has direct train connections to the major cities and
Schiphol International Airport.

TNO, organizer of MLMI 2008, is a not-for-profit research organization.
 TNO speech technological research is carried out in Soesterberg, at
TNO Human Factors, and has research areas in ASR, speaker and language
recognition, and word and event spotting.

The workshop will be held in "Ottone", a beautiful old building near the
"Singel", the canal which encircles the city center.  The conference
hall combines a spacious setting with a warm an friendly ambiance.


* Organizing Committee

David van Leeuwen, TNO (Organization Chair)
Anton Nijholt, University of Twente (Special Sessions Chair)
Andrei Popescu-Belis, IDIAP Research Institute (Programme Co-chair)
Rainer Stiefelhagen, University of Karlsruhe (Programme Co-chair)


Back to Top

9-5 . (2008-09-08) TDS 2008 11th Int.Conf. on Text, Speech and Dialogue IMPORTANT MSG

Eleventh International Conference on TEXT, SPEECH and DIALOGUE (TSD 2008)

Brno, Czech Republic, 8-12 September 2008

http://www.tsdconference.org/

The conference is organized by the Faculty of Informatics, Masaryk

University, Brno, and the Faculty of Applied Sciences, University of

West Bohemia, Pilsen. The conference is supported by International

Speech Communication Association.

Venue: Brno, Czech Republic

 

SUBMISSION OF DEMONSTRATION ABSTRACTS 

 

Authors are invited to present actual projects, developed software and hardware or interesting material relevant to the topics of the conference. The authors of the demonstrations should provide the abstract not exceeding one page as plain text. The submission must be made using an online form available at the conference www pages. The accepted demonstrations will be presented during a special Demonstration Session (see the Demo Instructions at
www.tsdconference.org).

 

Demonstrators can present their contribution with their own notebook with an Internet connection provided by the organisers or the organisers can prepare a PC computer with multimedia support for demonstrators.

 

IMPORTANT DATES

 

July 31 2008 ............. Submission of demonstration abstracts

 

August 7 2008 ............ Notification of acceptance for demonstrations sent to the authors

 

September 8-12 2008 ...... Conference date

 

The demonstration abstracts will not appear in the Proceedings of TSD 2008 but they will be published electronically at the conference website.

 

TSD SERIES

TSD series evolved as a prime forum for interaction between

researchers in both spoken and written language processing from the

former East Block countries and their Western colleagues. Proceedings

of TSD form a book published by Springer-Verlag in their Lecture Notes

in Artificial Intelligence (LNAI) series.

 

TOPICS

Topics of the conference will include (but are not limited to):

text corpora and tagging

transcription problems in spoken corpora

sense disambiguation

links between text and speech oriented systems

parsing issues

parsing problems in spoken texts

multi-lingual issues

multi-lingual dialogue systems

information retrieval and information extraction

text/topic summarization

machine translation

semantic networks and ontologies

semantic web

speech modeling

speech segmentation

speech recognition

search in speech for IR and IE

text-to-speech synthesis

dialogue systems

development of dialogue strategies

prosody in dialogues

emotions and personality modeling

user modeling

knowledge representation in relation to dialogue systems

assistive technologies based on speech and dialogue

applied systems and software

facial animation

visual speech synthesis

Papers on processing of languages other than English are strongly

encouraged.

 

PROGRAM COMMITTEE

Frederick Jelinek, USA (general chair)

Hynek Hermansky, Switzerland (executive chair)

FORMAT OF THE CONFERENCE

The conference program will include presentation of invited papers,

oral presentations, and a poster/demonstration sessions. Papers will

be presented in plenary or topic oriented sessions.

Social events including a trip in the vicinity of Brno will allow

for additional informal interactions.

 

CONFERENCE PROGRAM

The conference program will include oral presentations and

poster/demonstration sessions with sufficient time for discussions of

the issues raised.

 

IMPORTANT DATES

March 15 2008 ............ Submission of abstract

March 22 2008 ............ Submission of full papers

May 15 2008 .............. Notification of acceptance

May 31 2008 .............. Final papers (camera ready) and registration

July 23 2008 ............. Submission of demonstration abstracts

July 30 2008 ............. Notification of acceptance for

demonstrations sent to the authors

September 8-12 2008 ...... Conference date

The contributions to the conference will be published in proceedings

that will be made available to participants at the time of the

conference.

 

OFFICIAL LANGUAGE

of the conference will be English.

 

ADDRESS

All correspondence regarding the conference should be

addressed to

 

Dana Hlavackova, TSD 2008

Faculty of Informatics, Masaryk University

Botanicka 68a, 602 00 Brno, Czech Republic

phone: +420-5-49 49 33 29

fax: +420-5-49 49 18 20

email: tsd2008@tsdconference.org

 

LOCATION

Brno is the the second largest city in the Czech Republic with a

population of almost 400.000 and is the country's judiciary and

trade-fair center. Brno is the capital of Moravia, which is in the

south-east part of the Czech Republic. It had been a Royal City since

1347 and with its six universities it forms a cultural center of the

region.

Brno can be reached easily by direct flights from London, Moscow, Barcelona

and Prague and by trains or buses from Prague (200 km) or Vienna (130 km).

Back to Top

9-6 . (2008-09-10) 50th International Symposium ELMAR-2008

10-13 September 2008, Zadar, Croatia

http://www.elmar-zadar.org/

 

TECHNICAL CO-SPONSORS

IEEE Region 8

EURASIP - European Assoc. Signal, Speech and Image Processing

IEEE Croatia Section

IEEE Croatia Section Chapter of the Signal Processing Society

IEEE Croatia Section Joint Chapter of the AP/MTT Societies

TOPICS

 

--> Image and Video Processing

--> Multimedia Communications

--> Speech and Audio Processing

--> Wireless Commununications

--> Telecommunications

--> Antennas and Propagation

--> e-Learning and m-Learning

--> Navigation Systems

--> Ship Electronic Systems

--> Power Electronics and Automation

--> Naval Architecture

--> Sea Ecology

--> Special Session Proposals - A special session consist

of 5-6 papers which should present a unifying theme

from a diversity of viewpoints; deadline for proposals

is February 04, 2008.

KEYNOTE TALKS

* Professor Sanjit K. Mitra, University of Southern California, Los Angeles, California, USA:

Image Processing using Quadratic Volterra Filters

* Univ.Prof.Dr.techn. Markus Rupp, Vienna University

of Technology, AUSTRIA:

Testbeds and Rapid Prototyping in Wireless Systems

* Professor Paul Cross, University College London, UK:

GNSS Data Modeling: The Key to Increasing Safety and

Legally Critical Applications of GNSS

* Dr.-Ing. Malte Kob, RWTH Aachen University, GERMANY:

The Role of Resonators in the Generation of Voice

Signals

SPECIAL SESSIONS

SS1: "VISNET II - Networked Audiovisual Systems"

Organizer: Dr. Marta Mrak, I-lab, Centre for Communication

Systems Research, University of Surrey, UNITED KINGDOM

Contact: http://www.ee.surrey.ac.uk/CCSR/profiles?s_id=3D3937

SS2: "Computer Vision in Art"

Organizer: Asst.Prof. Peter Peer and Dr. Borut Batagelj,

University of Ljubljana, Faculty of Computer and Information

Science, Computer Vision Laboratory, SLOVENIA

Contact: http://www.lrv.fri.uni-lj.si/~peterp/ or

http://www.fri.uni-lj.si/en/personnel/298/oseba.html

SUBMISSION

Papers accepted by two reviewers will be published in

symposium proceedings available at the symposium and

abstracted/indexed in the INSPEC and IEEExplore database.

More info is available here: http://www.elmar-zadar.org/

IMPORTANT: Web-based (online) paper submission of papers in

PDF format is required for all authors. No e-mail, fax, or

postal submissions will be accepted. Authors should prepare

their papers according to ELMAR-2008 paper sample, convert

them to PDF based on IEEE requirements, and submit them using

web-based submission system by March 03, 2008.

SCHEDULE OF IMPORTANT DATES

Deadline for submission of full papers: March 03, 2008

Notification of acceptance mailed out by: April 21, 2008

Submission of (final) camera-ready papers : May 05, 2008

Preliminary program available online by: May 12, 2008

Registration forms and payment deadline: May 19, 2008

Accommodation deadline: June 02, 2008

GENERAL CO-CHAIRS

Ive Mustac, Tankerska plovidba, Zadar, Croatia

Branka Zovko-Cihlar, University of Zagreb, Croatia

PROGRAM CHAIR

Mislav Grgic, University of Zagreb, Croatia

CONTACT INFORMATION

Assoc.Prof. Mislav Grgic, Ph.D.

FER, Unska 3/XII

HR-10000 Zagreb

CROATIA

Telephone: + 385 1 6129 851=20

Fax: + 385 1 6129 568=20

E-mail: elmar2008 (_) fer.hr

For further information please visit:

http://www.elmar-zadar.org/

Back to Top

9-7 . (2008-09-24) Ecole Recherche Multimodale d'Information techniques & sciences (in french)

 Ecole Recherche Multimodale d'Information
techniques & sciences 
http://glotin.univ-tln.fr/ERMITES
 
Spécial Apprentissage Automatique pour la RI
 
du 24 au 26 septembre 2008
à la Presqu'île de Giens - Var.
 
Soutenue par le CNRS, l'UMR LSIS,
l'Association Francophone de la Commmunication Parlée (AFCP),
et l'Univ. du Sud Toulon-Var.
 
 
ERMITES 2008 est centrée sur l'apprentissage automatique pour la recherche d'information multimodale, en s'appuyant sur les campagnes d'évaluation dont Technolangue (parole), Technovision et CLEF, NIST, TREC dont la plupart des orateurs sont des acteurs. ERMITES 2008 présente les bases communes entre ces systèmes, et jete des ponts entre les différentes disciplines sollicitées. Cette dizaine de spécialistes d'analyses conjointes de textes, images, sons ou vidéos intervient sur 3 jours, avec discussions et démonstrations ouvertes. L'un des objectifs d'ERMITES, via ces exposés théoriques et empiriques, est de guider des chercheurs à concevoir des systèmes RI multimodaux incontournables de part la diffusion de plus en plus anarchique de l'information. L'originalité d'ERMITES est de mettre l'accent sur les analyses jointes de diverses modalités, démontrant l'intérêt de sortir d'un pré-carré spécifique.
 
ERMITES se tient sur le superbe VVF de La Badine - Presqu'île de Giens-Var, (accès TGV Toulon) - Connections internet assurées.
 
 
** INTERVANTS et RESUMES par thèmes:
 
GRAVIER - CR CNRS IRISA http://www.irisa.fr/metiss
"Transcription automatique de la parole /
Analyse de documents oraux"
* On présentera les fondements de base du traitement automatique de la parole, dans le cadre de l'analyse de la parole contenue dans des données multimédia. Après une présentation des différents constituants d'un système de transcription automatique de la parole, on évoquera l'évaluation, les niveaux de performances que l'on peut attendre de tels systèmes et les difficultés liées à la diversité des documents.
* Suite à l'exposé sur la transcription automatique de la parole, on présentera des travaux sur le traitement automatique des langues appliqué à des transcriptions automatiques dans le but de tendre à une analyse sémantique de documents contenant de la parole. On évoquera ainsi tour à tour l'analyse morphosyntaxique, la segmentation thématique, l'extraction de mots clés à l'aide de méthodes classiques de recherche d'information ainsi que la détection des entités nommés. On mettra en évidence les adaptations nécessaires des outils de traitement automatique des langues pour prendre en compte les spécificités des transcriptions automatiques.
 
 
BESACIER L.- MC LIG http://www.liglab.fr
"Reconnaissance de la parole et traduction automatique
pour l'interaction et le traitement de contenus multilingues"
Un des enjeux dans le domaine de l'interaction est le multilinguisme pour les communications entre humains ou entre l'homme et la machine. A ce titre, je présenterai un aperçu de l'état actuel des technologies de reconnaissance automatique de la parole multilingue et de traduction automatique probabiliste, qui ont aussi un potentiel intéressant pour le traitement de contenus audio. Des exemples de projets académiques et industriels récents sur ce thème (IBM MASTOR, projets GALE et TC-STAR) seront également présentés.
 
 
FARINAS J. - MC IRIT http://www.irit.fr/recherches/SAMOVA
"Vérification Automatique de la langue /
Structuration automatique de documents AV"
* Structuration automatique de documents audiovisuels : de la recherche d'évènements saillants et de la caractérisation de l'environnement à la structuration du document. Exemple sur la caractérisation de l'environnement sonore à travers le projet ANR EPAC.
* Vérification automatique de la langue : un système automatique de classification de la parole au sein de la plateforme biométrique MISTRAL. Les campagnes d'évaluation NIST seront également abordées dans ce cadre.
 
 
MARCEL S.- Chercheur senior IDIAP / EPF Lausanne http://www.idiap.ch
"A tutorial on face detection and recognition:
application to information retrieval"
In this tutorial, we will present state-of-the-art and advanced techniques in face detection and face recognition with a particular emphasis on applications such as information retrieval.
 
 
FERTIL - DR CNRS LSIS http://www.lsis.org
"Un exemple de compression supervisée de données visuelles de grande dimension: prédire l'âge de personnes à partir de photos du visage"
Je présenterai une étude qui s'intéresse aux signes du vieillissement et à leur impact sur l'âge apparent, étude réalisée afin de construire un algorithme capable de déterminer l'âge d'individus à partir de leurs photos. Dans un premier temps, sont déterminées et analysées les transformations, anatomiques qui altèrent le visage à partir de l'âge adulte (au-delà de 20 ans). Puis les signes sur lesquels on se base pour prédire l'âge d'une personne sont examinés. En s'appuyant sur les observations précédentes, un modèle prédictif de l'âge est finalement construit et validé. Cette étude a été réalisée à l'aide d'une méthode linéaire de compression de données supervisée, la régression PLS (partial least squares) dont on pourra mesurer la puissance à cette occasion. On présentera aussi une version 'kernelisée' de l'algorithme, à utiliser lorsque les relations entre variable à prédire et variables prédictives sortent du cadre linéaire.
 
 
KERMORVANT C. - IR R&D A2IA http://www.a2ia.com/Web_Bao/ACCUEIL-fr.aspx
"Entreprise Content Managment : extraction de données dans les documents numérisés"
Malgré l'usage croissant des documents numériques, les entreprises continuent à devoir traiter des volumes importants de documents papier : chèques, factures, fax, lettres de clients, dossiers, etc. Même si ces documents papier sont numérisés, leur traitement nécessite des techniques complexes : analyse de documents, reconnaissance de caractères (imprimés ou manuscrits), classification, extraction d'informations. Dans cet exposé, je présente un aperçu des différentes techniques mises en œuvre dans les produits proposés par A2iA pour le traitement des documents numérisés ainsi que des exemples d'applications.
 
 
MERIALDO B.- Pr. Eurecom Sophia http://www.eurecom.fr
"RI et indexation dans TRECVID"
Cette présentation fera le point des techniques récentes d'indexation multimédia, en particulier concernant la vidéo numérique. On s'intéressera également aux problèmes d'évaluation, et à la description des campagnes d'évaluation TrecVideo.
 
 
QUENOT G.- CR CNRS LIG http://clips.imag.fr/mrim/
"Apprentissage actif et RI dans TRECVID"
La plupart des méthodes d'indexation par le contenu des images et des vidéos fonctionnent par apprentissage supervisé. La performance des systèmes dépend de la qualité des algorithmes d'apprentissage et de classification mais aussi de la quantité et de la qualité des annotations disponibles, lesquelles sont coûteuses à obtenir à cause de l'intervention hunaine qu'elle nécessitent. L'apprentissage actif consiste à utiliser un système de classification pour sélectionner les échantillons les plus informatifs pour l'entraînement de ce même système.
Ce cours comprend deux parties. L'introduction décrit les principes, l'histoire et les principales applications de l'apprentissage actif. Puis nous donnons une analyse détaillée d'une application de l'apprentissage actif à l'annotation de corpus et à l'indexation de concepts dans les vidéos dans le cadre de TRECVID.
 
 
QUAFAFOU M.- Pr. LSIS http://www.lsis.org
"Web Multimedia Mining"
La démocratisation du web et des moyens d'acquisition, de stockage et de la diffusion de données multimédia fait émerger un univers global riche et complexe. Ce monde constitué de données multimédia distribuées sur le web est en perpétuelle évolution. Ce gisement de données hétérogène, dynamique et inconsistant par nature offre de nouvelles opportunités différentes de celles du web mining et multimédia mining. Le but de cette présentation est d'explorer ces nouveaux challenges notamment suivant la perspective de l'apprentissage automatique.
 
 
LE MAITRE - Pr. LSIS http://www.lsis.org
"Indexation de page web par rapport à leur contenu et à leur rendu visuel"
Les concepteurs de page web organisent les informations qu'elles contiennent de façon à faciliter leur consultation par les utilisateurs. Une page web peut être vue comme un ensemble de blocs contenant des informations multimédia (texte, image, vidéo). L'apparence visuelle d'un bloc (fonte, couleur de fond...) et sa position dans la page fournit une information sur son importance. De plus, un bloc peut apporter de l'information à un autre bloc (voisin, englobant, etc.). Par exemple, le texte entourant une image ou la référençant peut être utilisé pour indexer cette image. Un autre avantage de la prise en compte du découpage d'une page en blocs est la possibilité de localiser les réponses à une requête : les blocs les plus similaires sont retournés plutôt que les pages dans leur totalité. La précision et l'exhaustivité des réponses à une requête à des pages web pourraient donc être significativement améliorées en prenant en compte le rendu visuel de ces pages en plus de leur contenu sémantique. Dans cet exposé seront présentés : les principales techniques de segmentation d'une page web à partir le leur arbre DOM, les techniques d'évaluation de l'importance d'un bloc dans une page et le modèle d'indexation d'une page web conçu dans le cadre d'un travail de recherche mené au sein de l'équipe INCOD du LSIS. Les premiers résultats de l'application de ce modèle à l'interrogation de journaux électroniques seront aussi présentés.
 
 
****************
** INSCRIPTION :
****************
Pré-inscription par e-mail à
 
glotin@univ-tln.fr (sujet= ERMITES08)
 
Les premiers inscrits seront prioritaires (limités à 32).
Paiement ensuite par chèque, bon de Commande ou CB à l'AFCP.
 
* 3 BOURSES de 150 euros sont offertes par l'AFCP *
en faire la demande lors de votre inscription.
 
** Tarifs 2008:
(incluant 2 nuits, 7 repas, 2 petits dej., 6 pauses boissons / cafés, actes,...):
 
En chambre avec 2 lits simples séparés :
Doctorant, Postdoc, Master = 260 euros.
Autres = 390 euros.
 
En chambre avec 1 lit simple :
Doctorant, Postdoc, Master = 300 euros.
Autres = 420 euros.
 
Formule avec seulement repas et actes (sans nuit ni pt. dej.) : 150 euros.
 
===
Les organisateurs
H. Glotin & J. Le Maitre
Back to Top

9-8 . (2008-10-08) 2008 International Workshop on Multimedia Signal Processing

October 8-10, 2008 
Shangri-la Hotel Cairns, Queensland, Australia 
http://www.mmsp08.org/  
MMSP-08 Call for Papers  MMSP-08 is the tenth international workshop on multimedia signal 
processing. The workshop is organized by the Multimedia Signal Processing Technical 
Committee of the IEEE Signal Processing Society. A new theme of this workshop is 
Bio-Inspired Multimedia Signal Processing in Life Science Research. 
The main goal of MMSP-2008 is to further the scientific research within the broad field of 
multimedia signal processing and its interaction with other new emerging areas such 
as life science. The workshop will focus on major trends and challenges in this area, i
ncluding brainstorming a roadmap for the success of future research and application. 
MMSP-08 workshop consists of interesting features:   
* A Student Paper Contest with awards sponsored by Canon. To enter the contest a 
paper submission must have a student as the first author 
* A Best Paper from oral presentation session with awards sponsored by Microsoft. 
* A Best Poster presentation with awards sponsored by National ICT Australia (NICTA).   
* New session for Bio-Inspired Multimedia Signal Processing  SCOPE  Papers are solicited 
in, but not limited to, the following general areas: 
*Bio-inspired multimedia signal processing 
*Multimedia processing techniques inspired by the study of signals/images derived from 
medical, biomedical and other life science disciplines with applications to multimedia signal processing. *Fusion mechanism 
of multimodal signals in human information processing system and applications to 
multimodal multimedia data fusion/integration. 
*Comparison between bio-inspired methods and conventional methods. 
*Hybrid multimedia processing technology and systems incorporating bio-inspired and 
conventional methods. 
*Joint audio/visual processing, pattern recognition, sensor fusion, medical imaging, 
2-D and 3-D graphics/geometry coding and animation, pre/post-processing of digital video, 
joint source/channel coding, data streaming, speech/audio, image/video coding and 
processing 
*Multimedia databases (content analysis, representation, indexing, recognition and 
retrieval) 
*Human-machine interfaces and interaction using multiple modalities 
*Multimedia security (data hiding, authentication, and access control)   
*Multimedia networking (priority-based QoS control and scheduling, traffic engineering, 
soft IP multicast support, home networking technologies, position aware computing, 
wireless communications). 
*Multimedia Systems Design, Implementation and Application (design, distributed 
multimedia systems, real time and non-real-time systems; implementation; multimedia 
hardware and software) 
*Standards    
SCHEDULE  
* Special Sessions (contact the respective chair):  March 8, 2008  
* Papers (full paper, 4-6 pages, to be received by):  April 18, 2008  
* Notification of acceptance by:  June 18,  2008 
* Camera-ready paper submission by:  July 18, 2008  
 
General Co-Chairs 
Prof. David Feng,  University of Sydney, Australia, and Hong Kong 
Polytechnic University feng@it.usyd.edu.au  
Prof. Thomas Sikora,  Technical University Berlin Germany sikora@nue.tu-berlin.de  
Prof. W.C. Siu,  Hong Kong Polytechnic University enwcsiu@polyu.edu.hk  
Technical Program Co-Chairs 
Dr. Jian Zhang National ICT Australia jian.zhang@nicta.com.au  
Prof. Ling Guan Ryerson University, Canada  lguan@ee.ryerson.ca  
Prof. Jean-Luc Dugelay Institute EURECOM, Sophia Antipolis, France  Jean-Luc.Dugelay@eurecom.fr  
Special Session Co-Chairs: 
Prof. Wenjun Zeng University of Missouri, USA  zengw@missouri.edu  
Prof. Pascal Frossard EPFL, Switzerland pascal.frossard@epfl.ch  
Back to Top

9-9 . (2008-10-16) 4th IBM Watson Emerging leaders in Multimedia at IBM Watson.

The IBM Watson “Emerging Leaders in Multimedia” workshop series is an annual event organized to recognize outstanding student researchers in the multimedia area. We are currently inviting student applications for the fourth workshop in this series. This is a two day event that will be held on October 16 and 17, 2008 at the IBM T. J. Watson Research Center in Hawthorne, New York. The workshop will consist of student research presentations, demonstrations of multimedia projects currently underway at IBM, and several interactive sessions among students and researchers on open and emerging problems in the field and exciting directions for future research. Please visit the following website http://domino.research.ibm.com/comm/research.nsf/pages/r.multimedia.workshop2008.html for more information.

We plan to invite 8 exceptional graduate students working in these areas to visit our labs(expenses covered by IBM), present their research, and learn about the state-of-the art industrial media research at this workshop. We encourage mid to senior level graduate PhD. students from CS, EE, ECE, and all other relevant disciplines to apply. The application package should include a short (2-3 paragraphs) abstract that describes the student's current research, an up to date resume with a list of publications, and a letter of support from the student's thesis advisor. Additional supporting material is optional.
Please submit your applications by August 24, 2008 to Gayathri Shaikh (g3@us.ibm.com) or Ying Li (yingli@us.ibm.com).




Back to Top

9-10 . (2008-10-16) 2008 IEEE Intl Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING

2008 IEEE International Workshop on MACHINE LEARNING FOR SIGNAL PROCESSING
(Formerly the IEEE Workshop on Neural Networks for Signal Processing)

October 16-19, 2008 Cancun, Mexico
Fiesta Americana Condesa Cancun, www.fiestamericana.com

Deadlines:
Submission of full paper:                     May 5, 2008
Notification of acceptance:                     June 16, 2008
Camera-ready paper and author registration:     June 23, 2008
Advance registration before:                    July 1, 2008

http://mlsp2008.conwiz.dk/

The workshop will feature keynote addresses, technical presentations, special
sessions and tutorials organized in two themes that will be included in the
registration. Tutorials will take place on the afternoon of 16 October, and
the workshop will begin on 17 October. The two themes for MLSP 2008 are
Cognitive Sensing and Kernel Methods for Nonlinear Signal Processing. Papers
are solicited for, but not limited to, the following areas:

Algorithms and Architectures:
Artificial neural networks, kernel methods, committee models, Gaussian
processes, independent component analysis, advanced (adaptive, nonlinear)
signal processing, (hidden) Markov models, Bayesian modeling, parameter
estimation, generalization, optimization, design algorithms.

Applications:
Speech processing, image processing (computer vision, OCR) medical imaging,
multimodal interactions, multi-channel processing, intelligent multimedia and
web processing, robotics, sonar and radar, biomedical engineering, financial
analysis, time series prediction, blind source separation, data fusion, data
mining, adaptive filtering, communications, sensors, system identification,
and other signal processing and pattern recognition applications.

Implementations:
Parallel and distributed implementation, hardware design, and other general
implementation technologies.

For the fourth consecutive year, a Data Analysis and Signal Processing
Competition is being organized in conjunction with the workshop. The goal of
the competition is to advance the current state-of-the-art in theoretical and
practical aspects of signal processing domains. The problems are selected to
reflect current trends, evaluate existing approaches on common benchmarks, and
identify critical new areas of research. Previous competitions produced novel
and effective approaches to challenging problems, advancing the mission of the
MLSP community. A description of the competition, the submissions, and the
results, will be included in a paper which will be published in the
proceedings. Winners will be announced and awards given at the workshop.

Selected papers from MLSP 2008 will be considered for a special issue of The
Journal of Signal Processing Systems for Signal, Image, and Video Technology,
to appear in 2009. The MLSP technical committee may invite one or more winners
of the data analysis and signal processing competition to submit a paper
describing their methodology to the special issue.

Paper Submission Procedure
Prospective authors are invited to submit a double column paper of up to six
pages using the electronic submission procedure at http://mlsp2008.conwiz.dk.
Accepted papers will be published on a CDROM to be distributed at the
workshop.

MLSP'2007 webpage: http://mlsp2008.conwiz.dk/

MLSP 2008 ORGANIZING COMMITTEE:

General Chair
Jose Principe

Program Chair
Deniz Erdogmus

Technical Chair
Tulay Adali

Publicity Chairs
Ignacio Santamaria
Marc Van Hulle

Publication Chair
Jan Larsen

Data Competition
Ken Hild
Vince Calhoun

Local Arrangements
Juan Azuela

Back to Top

9-11 . (2008-10-01) Colloque Langage et Cognition (Paris) (in french)

COLLOQUE « LANGAGE ET COGNITION »

 

Ier et 2 octobre 2008

Maison de la Chimie

28, rue Saint-Dominique

75007 PARIS

 

 

Organisé par le Réseau Thématique Européen « Langage et Cognition »

Issu de l’ACI COGNITIQUE (1999-2002)

 

Sous l’égide de la Fondation Maison des Sciences de l’Homme

Adresse du Site : http://www.msh-paris.fr

Adresse du site d’OCTOGONE E.A 4156 :

http://w3.octogone.univ-tlse2.fr/articles.php?lng=fr&pg=20

 

 

Coordonnateurs :

 

Jean-Luc Nespoulous

 

Université de Toulouse Le Mirail

Institut Universitaire de France

Unité de Recherche Interdisciplinaire OCTOGONE E.A 4156

Laboratoire Jacques-Lordat

Institut des Sciences du Cerveau de Toulouse IFR 96

nespoulo@univ-tlse2.fr

 

&

 

Michel Fayol

 

Université Blaise Pascal, Clermont-Ferrand

Laboratoire de Psychologie Sociale et Cognitive UMR CNRS 6024

 

 

 

 

 


 

 

Mercredi Ier Octobre

 

8h30 : Accueil

8h45 : Introduction : Jean-Luc Nespoulous et Michel Fayol

 

Thème 1 : « Langage et Cognition : de la phrase au discours »

 

9h :       « From sentence to discourse : spatial framing adverbials as discourse markers »

Michel Charolles & Laure Sarda (Paris)

9h45 : « Effect on understanding of preposed and postposed locative prepositional phrases »

            Saveria Colonna (Paris)

« Effect of locative prepositional phrases on anaphor resolution and lexical desambiguation »

Joël Pynte (Paris)

10h30 : Discussion

10h45 : PAUSE

11h15 : « From Situation Models to Mental Simulations »

            Rolf Zwaan (Rotterdam)

12h : Discussion générale

12h-30 : repas

 

Thème 2 : « Diversité des Langues et Cognition »

 

14h30 : « La diversité des langues à travers les phénomènes de classification dans les langues orales et signées : aspects méthodologiques »

Barbara Köpke (Toulouse) & Colette Grinevald (Lyon)

15h15 : « Scripturisation des langues des signes et contraintes liées à la modalité : état des lieux »

            Brigitte Garcia (Paris)

16h : Discussion

16h15 : PAUSE

16h45 : « Formes linguistiques et catégories sémantiques : le cas des hiéroglyphes classificateurs de l’ancienne Egypte »

            Orly Goldwasser (Jerusalem), Colette Grinevald (Lyon) & Danièle Dubois (Paris)

17h30 : Discussion générale

 

 

Jeudi 2 Octobre

 

8h30 : Invited address : « The roles of general cognitive capacities and domain-specific cognitive skills in language comprehension »

            David Caplan (Cambridge) & Gloria Waters (Boston)

 

Thème 3 : « Langage, Communication, Pragmatique »

 

9h15 : « Quelles contraintes pragmatiques pour le langage et la communication ? »

            Michèle Guidetti (Toulouse) & Jean-Louis Dessalles (Paris)

10h : « Quel lien entre troubles pragmatiques et déficits cognitifs ? Le cas de la schizophrénie et des lésions cérébrales droites »

Maud Champagne (Montréal)

10h45 : Discussion

11h : PAUSE

11h30 :  « Structure(s) et contexte(s) dans une théorie pragmatique»

            Jef Verschueren (Anvers)

12h15 : Discussion générale

12h-30 : repas

 

Thème 4 : « Perception & Compréhension »

 

14h30 : « Speaking in the brain »

            Jean-François Démonet (Toulouse) & Mireille Besson (Marseille)

15h15 : « Abstract representations in speech perception : sounds, words and voices »

James McQueen (Nijmegen)

16h :     Discussion

16h15 : PAUSE

16h45 : « Functional and structural neuroanatomy of language »

            Angela Friederici (Leipzig)

17h30 : Discussion générale

18h : Fin des travaux

 

 

Back to Top

9-12 . (2008-10-20) 10th International Conference on Multimodal Interfaces (ICMI 2008)

The Tenth International Conference on Multimodal Interfaces (ICMI

2008) will take place in Chania, Greece, on October 20-22, 2008. The

main aim of ICMI 2008 is to further scientific research within the

broad field of multimodal interaction and systems. The conference will

focus on major trends and challenges in this area, including help

identify a roadmap for future research and commercial success. ICMI

2008 will feature a main conference with keynote speakers, panel

discussions, technical paper presentations and discussion (single

track), poster sessions, and demonstrations of state-of-the-art

multimodal concepts and systems. Organized on the island of Crete,

ICMI-08 provides excellent conditions for brainstorming and sharing

the latest advances about multimodal interaction and systems in an

inspired setting full of history, mythology and art.

Paper Submission

There are two different submission categories: regular paper and short

paper. The page limit is 8 pages for regular papers and 4 pages for

short papers. The presentation style (oral or poster) will be decided

based on suitable delivery of the content.

 

Demo Submission

Proposals for demonstrations shall be submitted to demo chairs

electronically. A 1-2 page description of the demonstration is required.

 

Doctoral Spotlight

Doctoral Student Travel Support and Spotlight Session. Funds are

expected from NSF to support participation of doctoral candidates at

ICMI 2008, and a spotlight session is planned to showcase ongoing

thesis work. Students

interested in travel support can submit a short or long paper as

specified above.

 

Topics of interest include

* Multimodal and Multimedia processing

* Multimodal input and output interfaces

* Multimodal applications

* User Modeling and Adaptation

* Multimodal Architectures, Tools and Standards

* Evaluation of Multimodal Interfaces

 

*Important Dates:*

Paper submission: May 23, 2008

Author notification July 14, 2008

Camera ready deadline: August 15, 2008

Conference: October 20-22, 2008

 

Organizing Committee

 

General Co-Chairs

Vassilis Digalakis, TU Crete, Greece

Alex Potamianos, TU Crete, Greece

Matthew Turk, UC Santa Barbara, USA

 

Program Co-Chairs

Roberto Pieraccini, SpeechCycle, USA

Jian Wang, Microsoft Research, China

Yuri Ivanov, MERL 
Back to Top

9-13 . (2008-10-26) 9th International Conference on Signal Processing

Oct. 26-29, 2008 Beijing, CHINA
 
The 9th International Conference on Signal Processing will be held in Beijing,
China on Oct. 26-29, 2008. It will include sessions on all aspects of theory,
design and applications of signal processing. Prospective authors are invited
to propose papers in any of the following areas, but not limited to:
 
A. Digital Signal Processing (DSP)
B. Spectrum Estimation & Modeling
C. TF Spectrum Analysis & Wavelet
D. Higher Order Spectral Analysis
E. Adaptive Filtering & SP
F. Array Signal Processing
G. Hardware Implementation for SP
H  Speech and Audio Coding
I. Speech Synthesis & Recognition
J. Image Processing & Understanding
K. PDE for Image Processing
L. Video compression & Streaming
M. Computer Vision & VR
N. Multimedia & Human-computer Interaction
O. Statistic Learning,ML & Pattern Recognition
P. AI & Neural Networks
Q. Communication Signal Processing
R. SP for Internet, Wireless and Communications
S. Biometrics & Authentification
T. SP for Bio-medical & Cognitive Science
U. SP for Bio-informatics
V. Signal Processing for Security
W. Radar Signal Processing 
X. Sonar Signal Processing and Localization
Y. SP for Sensor Networks
Z. Application & Others
 
PAPER SUBMISSION GUIDELINE
Prospective authors are invited to submit the full papers, which should be
composed of title of the paper, author's names, addresses, telephone, Fax,
E-mail, topic area, by uploading the electronic submissions in .pdf format to
 
http://icsp08.bjtu.edu.cn
 
Before June 15 , 2008.
 
PROCEEDINGS
The proceedings with Catalog number of IEEE and Library of Congress will be
published prior to the conference in both hardcopy and CD-ROM, and distributed
to all registered participants at the conference. The proceedings will be
indexed by EI.
 
LANGUAGE
The working language is English.
 
TOURS
The accompanying person’s activities and tours will be arranged by Organizing
Committee.
 
DEADLINES
Submission of papers               June 15, 2008
Notification of acceptance         July 15, 2008
Submission of Camera-ready papers  Aug. 15, 2008
Pre-registration                   Sept. 20, 2008
       
 
Please visit http://icsp08.bjtu.edu.cn for more details.
 
Sponsor 
IEEE Beijing Section
Technical Co-sponsor
IEEE Signal Processing Society   
Co-sponsors
The Chinese Institute of Electronics
IET
URSI
Nat. Natural Sci. Foundation of China
IEEE SP Society Beijing Chapter
IEEE Computer Society Beijing Chapter
Japan China Science and Technology 
Exchange Association
 
Organizers
Beijing Jiaotong University
CIE Signal Processing Society
 
Technical Program Committee
Prof. RUAN Qiuqi
Beijing Jiaotong University
Beijing 100044, CHINA
Tel.: (8610)5168-8616, 5168-8073
Email: bzyuan@bjtu.edu.cn
 
Organizing Committee
Mr. ZHOU Mengqi
P.O. Box 165, Beijing 100036,CHINA
Email: zhoumq@public3.bta.net.cn
 
Secretary
Ms. TANG Xiaofang
Email: bfxxstxf@bjtu.edu.cn
Back to Top

9-14 . (2008-10-27) Speech and Face to Face Communication: a Christian Benoit Memorial

Speech and Face to Face Communication
27-29 October 2008, Grenoble, France
 
A Workshop/Summer school dedicated to the memory of Christian Benoît
 
Associated to a special issue of the Speech Communication journal
Ten years after our colleague Christian Benoît departed, the mark he left is still
very vivid in the international community. A workshop/summer school dedicated to
his memory will be organised in the line of his innovative and enthusiastic research
style. It will aim at exploring the topic of "Speech and Face to Face Communication"
in a pluridisciplinary perspective: neuroscience, cognitive psychology, phonetics,
linguistics and computer modelling. The "Speech and Face to Face Communication"
workshop will be organized around invited talks. All researchers from the field are
invited to participate through a call for papers and students are encouraged to
widely attend the workshop and present their work.
A special session on all the aspects of speech communication research will also be
organized during the workshop.
It is still time to send a proposal
Conference website
http://www.icp.inpg.fr/~dohen/face2face/
contact: jean-luc.schwartz@gipsa-lab.inpg.fr
Registration fees: 100 euros - Students: 50 euros
AFCP or ISCA members: 80 euros - Students: 40 euros
 
Info about the Speech Communication special issue
http://www.elsevier.com/wps/find/journaldescription.cws_home/505597/description#description
Back to Top

9-15 . (2008-11-12) V Jornadas en Tecnologia de Habla and Evaluation campaigns Bilbao Spain

 

VJTH’2008 – CALL FOR PAPERS

5th Workshop on Speech Technology                      V Jornadas en Tecnología del Habla

November 12-14, 2008, Bilbao, Spain

http://jth2008.ehu.es

Organized by the Aholab-Signal Processing Laboratory of the Dept. of Electronics and Telecommunications of the University of the Basque Country (UPV/EHU) and supported by the Spanish Thematic Network on Speech Technologies and ISCA.

The “V Jornadas en Tecnología del Habla” (http://jth2008.ehu.es) , will be held in November 12-14, 2008 in Bilbao, Spain. Previous workshops were held in Sevilla (2000), Granada (2002), Valencia (2004) and Zaragoza (2006).The aim of the workshop is to present and discuss the wide range of speech technologies and applications related to Iberian languages. The workshop will feature technical presentations, special sessions and invited conferences, all of which will be included in the registration. During the workshop, the results of the ALBAYZIN 08 Evaluation campaigns and best papers awards will be presented.

The main topics of the workshop are:


  • Speech recognition and understanding
  • Speech synthesis
  • Signal processing and feature extraction
  • Natural language processing
  • Dialogue systems
  • Automatic translation
  • Speech perception
  • Speech coding
  • Speaker and language identification
  • Speech and language resources
  • Information retrieval
  • Applications for handicapped persons
  • Applied systems for advanced interaction


 

Invited Speakers:

·         Nestor Becerra (Universidad de Santiago, Chile)

Aplicaciones de las tecnologías del habla en sistemas CALL (Computer Aided Language Training) y CAPT (Computer Aided Pronunciation Training)

·         Giussepe Ricardi (University of Trento, Italy)

Next Generation Spoken Language Interfaces

·         Björn Granstrom (KTH - Royal Institute of Technology, Suecia)

Embodied conversational agents  in verbal and non-verbal communication

·         Yannis Stilianou (University of Crete, Grecia)

Voice Conversion: State of the art and Perspectives

Important dates:

·         Full paper submission: July 20, 2008

  • Notification of acceptance: October 1, 2008
  • Conference V JTH 2008: November 12-14, 2008

Contact information:

VJTH’2008

Dept. Electronics and Telecommunications

Faculty of Engineering

Alda. Urkijo s/n

48013 Bilbao

Tel.: +34 946 013 969

Fax.: +34 946 014 259

E-mail: 5jth@ehu.es           Web: http://jth2008.ehu.es

 

 

 

EVALUATION CAMPAIGNS

               ALBAYZIN-08 System Evaluation Proposal

The Speech Technologies Thematic Network ("Red Temática en Tecnologías del Habla") is a common forum where the researchers on Speech Technologies can work together and share experiences in order to: 

  • Promote Speech Technology research, attracting new young researchers by means of formation courses, student interchange, grants and awards.
  • Get investments from enterprises for Speech Technology research, looking for new applications  that can bring business opportunities. These applications must be shown in demostrators that can attract enterprises' interest.
  • Make progress in creating collaboration ties among the Network members, enforcing the leadership of Spain in the Spanish speech technologies, as well as the co-official languages, such as Catalan, Basque or Galician.

 

In order to promote new young researchers' Speech Technology investigation, the "Red Temática en Tecnologías del Habla"  organizes a system evaluation proposal, on the next areas: 

 

Registration Form

 

http://gtts.ehu.es:8080/RTTH-LRE08/Formulario.jsp

 

 

Registration Form

 

http://jth2008.ehu.es/form_ALBAYZIN08_CTV_en.pdf

 

 

Registration Form

http://jth2008.ehu.es/form_ALBAYZIN08_TA_en.pdf

 

These are the conditions for the participants: 

 

The participants undertake to present the evaluation results in a special session during the V Jornadas en Tecnología del Habla. 

Participants can take part individually or as a team.

 

 

 

 

 

Back to Top

9-16 . (2008-12-08) 8th International Seminar on Speech Production - ISSP 2008

We are pleased to announce that the eighth International Seminar on Speech Production - ISSP 2008 will be held in Strasbourg, Alsace, France from the 8th to the 12th of December, 2008.

We are looking forward to continuing the tradition established at previous ISSP meetings in Grenoble, Leeds, Old Saybrook, Autrans, Kloster Seeon, Sydney, and Ubatuba of providing a congenial forum for presentation and discussion of current research in all aspects of speech production.

The following invited speakers have accepted to present their ongoing research works:

Vincent Gracco
McGill University, Montreal, Canada
General topic Neural control of speech production and perception


Sadao HIROYA
Boston University, United States
General topic Speech production and perception, brain imaging and stochastic speech production modeling


Alexis Michaud
Phonetics and Phonology Laboratory of Université Paris III, Paris, France
General topic Prosody in tone languages


Marianne Pouplier
Institute for Phonetics and Speech Communication, Munich, Germany
General topic Articulatory speech errors


Gregor Schoener
Institute for Neuroinformatics Bochum, Germany
General topic Motor control of multi-degree of freedom movements

 

Topics covered

Topics of interest for ISSP'2008 include, but are not restricted to, the following:

  • Articulatory-acoustic relations
  • Perception-action control
  • Intra- and inter-speaker variability
  • Articulatory synthesis
  • Acoustic to articulatory inversion
  • Connected speech processes
  • Coarticulation
  • Prosody
  • Biomechanical modeling
  • Models of motor control
  • Audiovisual synthesis
  • Aerodynamic models and data
  • Cerebral organization and neural correlates of speech
  • Disorders of speech motor control
  • Instrumental techniques
  • Speech and language acquisition
  • Audio-visual speech perception
  • Plasticity of speech production and perception

In addition, the following special sessions are currently being planned:

1. Speech inversion (Yves Laprie)

2. Experimental techniques investigating speech (Susanne Fuchs)

For abstract submission, please include:

•1)      the name(s) of the author(s);

•2)       affiliations, a contact e-mail address;

•3)      whether you prefer an oral or a poster presentation in the first lines of the body of the message.

All abstracts should be no longer than 2 pages (font 12 points, Times) and written in English.

Deadline for abstract submission is the 28th of March 2008. All details can be viewed at

http://issp2008.loria.fr/

Notification of acceptance will be given on the 21st of April, 2008.

The organizers:

Rudolph Sock

Yves Laprie

Susanne Fuchs

 

Back to Top

9-17 . (2008-12-15) CfP/Demos 2nd International Symposium on Universal Communication

Call for Papers/Demos
  
  2nd International Symposium on Universal Communication Dec 15 - 16,
  2008 Osaka International Convention Center, Osaka, Japan
  
  
  
  The development of information network systems enables us to
  communicate with people in remote places "anytime and anywhere",
  enriching human knowledge, affection and sensibility. However, there
  are various barriers which stand in our way to using these systems
  freely and flexibly. In order to discuss how to overcome these
  barriers and create a more human-centered communication environment,
  in 2007 the first International Symposium on Universal Communication
  was held in Kyoto, Japan featuring discussions with well-known
  researchers from around the world. Following its success, the second
  International Symposium on Universal Communication will be held in
  Osaka, Japan.
  
  
  
  Topics of interest are as follows, but not limited to
  
  
 - Information retrieval and information analysis
  
  
 - Information credibility
  
  
 - Knowledge processing
  
  
 - Language resources
  
  
 - Speech recognition and synthesis
  
  
 - Machine translation and speech translation
  
  
 - Natural language processing
  
  
 - Spoken language processing
  
  
 - Multilingual information processing
  
  
 - Super high-resolution image technology
  
  
 - 3D visualization, imaging and display technologies
  
  
 - 3D sound processing
  
  
 - Virtual reality, mixed reality and augmented reality
  
  
 - Multisensory (visual, acoustic, haptic, olfactory, etc) interaction
  
  
 - Human factors
  
  
 - Human interface and interaction technologies
  
  
 - Real-world sensing technologies
  
  
  
  General Chair
  
  Yuichi Matsushima, National Institute of Information and
  Communications Technology (NICT), Japan
  
  
  
  General Vice Chairs
  
  Kazumasa Enami, NICT, Japan
  
  Hiromitsu Wakana, NICT, Japan
  
  
  
  Technical Program Committee
  
  Chair: Satoshi Nakamura, NICT/ATR, Japan Vice Chair: Naomi Inoue,
  NICT/ATR, Japan
  
  - Akio Ando, NHK Science & Technical Research Laboratories, Japan
  
  - Martin S. Banks, UC Berkeley, USA
  
  - Khalid Choukri, ELDA, France
  
  - Marcello Federico, FBK/IRST, Italy
  
  - Sidney Fels, The University of British Columbia, Canada
  
  - Jukka Häkkinen, University of Helsinki, Finland
  
  - Munpyo Hong, Sungkyunkwan University, Korea
  
  - Kentaro Inui, Nara Institute of Science and Technology, Japan
  
  - Hitoshi Isahara, NICT, Japan
  
  - Ken Kaneiwa, NICT, Japan
  
  - Takashi Kawai, Waseda University, Japan
  
  - Yutaka Kidawara, NICT, Japan
  
  - Kyeong Soo Kim, Swansea University, UK
  
  - Hisashi Miyamori, Kyoto Sangyo University, Japan
  
  - Makoto Okui, NICT, Japan
  
  - Tanja Schultz, Carnegie Melon University, USA
  
  - Yasuyuki Sumi, Kyoto University, Japan
  
  - Eiichiro Sumita, NICT/ATR, Japan
  
  - Yoiti Suzuki, Tohoku University, Japan
  
  - Yasuhiro Takaki, Tokyo University of Agriculture and Technology, Japan
  
  - Kazuya Takeda, Nagoya University, Japan
  
  - Kentaro Torisawa, NICT, Japan
  
  - Chiu-yu Tseng, Institute of Linguistics Academia Sinica, Taiwan
  
  - Kiyotaka Uchimoto, NICT, Japan
  
  - Takehito Utsuro, University of Tsukuba, Japan
  
  - Andy Way, Dublin City University, Ireland
  
  - Andrew Woods, Curtin University of Technology, Australia
  
  - Wieslaw Woszczyk, McGill University, Canada
  
  - Xing Xie, Microsoft Research Asia, China
  
  - Tatsuya Yamazaki, NICT, Japan
  
  - Kwon Yongjin, Korea AeroSpace University, Korea
  
  - Daqing Zhang, Institut TELECOM & Management SudParis, France
  
  
  
  Demo Chair
  
  Kazuhiro Kimura, NICT, Japan
  
  
  
  Technical Advisors
  
  Takashi Matsuyama, Kyoto University, Japan Michitaka Hirose, Tokyo
  University, Japan
  
  
  
  Local Arrangement Co-Chairs
  
  Kazuhiro Kimura, NICT,
  
  Yukio Takahashi, NICT
  
  Contact info: isuc2008@khn.nict.go.jp
  
  
  
  Website
  
  http://www.is-uc.org/2008/
  
  
  
  Venue
  
  Osaka International Convention Center
  
  3-51 Nakanoshima 5-chome,Kita-ku,Osaka, Japan
  http://www.gco.co.jp/english/english.html
  
  
  
  Submission Information
  
  All papers must be submitted through the ISUC 2008 homepage at
  http://www.is-uc.org/2008.
  
  
  
  Paper submission:
  
  The extended abstract must be written in English and is limited to 2
  pages in the IEEE 2-column format. This abstract should also indicate
  whether it is an oral paper or a poster, and be submitted in PDF. Once
  accepted, the camera-ready paper shall be 4~8 pages, not exceeding 8
  pages.
  
  More detailed author guidelines are available at
  http://www.is-uc.org/2008/submission.
  
  
  
  All accepted papers will appear in the conference proceedings
  published by the IEEE Computer Society and will be included in the
  IEEE-Xplore and the IEEE Computer Society (CSDL) digital libraries as
  well as indexed through IET INSPEC, EI (Compendex) and Thomson ISI.
  
  
  
  Demo submission:
  
  Please submit a one-page description of your demo in PDF. This
  description must be written in English and should include: An abstract
  of what you will show, Space needed, Facilities needed including power
  supply and Internet access. A specified submission format will be
  available on the ISUC 2008 homepage.
  
  
  
  Important Dates
  
  Papers:
  
  Extended Abstract due          July 25, 2008
  
  Notification of acceptance      August 29, 2008
  
  Camera-ready papers due      September 26, 2008
  
  
  
  Demos:
  
  Submission of description due September 19, 2008
  
  Notification of acceptance      October 10, 2008
  
  
  
  
  

 

Back to Top

9-18 . (2008-12-16) 2008 International Symposium on Chinese Spoken Language Processing (ISCSLP 2008)

 2008 International Symposium on

         Chinese Spoken Language Processing (ISCSLP 2008)

 

                          December 16 - 19, 2008

                              Kunming, China

                       http://www.iscslp2008.org

 

ISCSLP’08 is the flagship conference of ISCA SIG-CSLP (Special Interest Group on Chinese Spoken Language Processing).

 

ISCSLP'08 will be held during December 16-19, 2008 in Kunming hosted by The University of Science and Technology of China and Yunnan University.

 

ISCSLP (International Symposium on Chinese Spoken Language Processing) is a conference for scientists, researchers, and practitioners to report and discuss the latest progress in all scientific and technological aspects of Chinese spoken language processing (CSLP). The idea of having a series of regular conferences devoted to CSLP was an outcome of a small-group meeting held in December 1997 in Singapore. The meeting was organized and chaired by Professor Chin-Hui Lee, then worked at Bell Laboratories, USA; and attended by Professors Tai-Yi Huang and Ren-Hua Wang from mainland China, Professors Chorkin Chan and Pak-Chung Ching from Hong Kong, Professor Kim-Teng Lua and Dr. Haizhou Li from Singapore, and Professors Lin-Shan Lee and Hsiao-Chuan Wang from Taiwan. A Steering Committee, being chaired by Professor Chin-Hui Lee and consisting of the abovementioned nine members, was established to oversee the ISCSLP conferences. It was decided that a bi-annual symposium will be organized and hosted initially by research groups from Asia Pacific regions. Since its inception, ISCSLP has become the world's largest and most comprehensive technical conference focused on Chinese spoken language processing and its applications. In ISCSLP 2002, a special interest group was formed as SIG-CSLP of International Speech Communication Association (ISCA). ISCSLP is now an ISCA and IEEE supported event.

 

We invite your participation in this premier conference, where the language from ancient civilizations embraces modern computing technology. The ISCSLP'08 will feature world-renowned plenary speakers, tutorials, exhibits, and a number of lecture and poster sessions. The concrete version is attached to this mail.

In response to popular requests from authors, the paper submission deadline is extended. The new deadline is Jul 29, 2008.

 

The Keynote speakers of ISCSLP2008 is as following:

 

Qiang Huo

Microsoft Research Asia, Beijing, China

Research Area

Automatic speech & speaker recognition and related multidisciplinary research topics

Chinese character recognition

Biometric authentication

Document analysis and recognition

Machine learning, etc.

 

Shigeki Sagayama

Department of Information Physics and Computing Graduate School of Information Science and Technology, The University of Tokyo, Japan

Research Area

Speech and spoken language processing

Signal processing

Music signal/Information processing

Hand-written character recognition

Multimedia Information processing, etc.

 

Vincent Vanhoucke

Google, USA

Research Area

Software engineering

Text recognition

Speech recognition

Image processing

Face recognition, etc.

 

Yuqing Gao

IBM T. J. Watson Research Center, USA

Research Area

Speech recognition

understanding and Translation Research

The large vocabulary continuous speech dictation system

Speech-to-speech translation research, etc.

 

Hideki Kawahara

Design Information Sciences Department, Faculty of Systems Engineering, Wakayama University, Japan

Research Area

Focus on the use of STRAIGHT in research on human speech perception

Signal processing models of hearing and neural networks

Interaction between speech perception and production, etc.

 

Yu Hu

Research Director of iFLYTEK, Hefei, China

Research Area

Speech pronunciation evaluation

Speech pronunciation defect detection, etc

 

Paper Submission

Authors are invited to submit original, unpublished work in English.

Papers should be submitted via http://www.iscslp2008.org.

Each submission will be reviewed by two or more reviewers.

At least one author of each paper is required to register.

 

Schedule

Full paper submission by Jun. 29, 2008

Extended deadline for submission of full papers by Jul 29, 2008

Notification of acceptance by   Aug.24, 2008

Camera ready papers by    Sep.03, 2008

Registration to cover an accepted paper by    Sep.19, 2008

Back to Top

9-19 . (2009-01-07) 1st CfP 5th International MultiMedia Modeling Conference (MMM2009)

FIRST CALL FOR PAPERS
The 15th International MultiMedia Modeling Conference (MMM2009)
7-9 January 2009,
Institut EURECOM, Sophia Antipolis, France.
 
http://mmm2009.eurecom.fr
 
===============================================================
 
The International MultiMedia Modeling (MMM) Conference is a 
leading international conference http://mmm2009.eurecom.fr for 
researchers and industry practitioners to share their new ideas,
original research results and practical development experiences 
from all MMM related areas. The conference calls for original 
high-quality papers in, but not limited to, the following areas 
related to multimedia modeling technologies and applications:
 
1. Multimedia Content Analysis
1.1 Multimodal Content Analysis
1.2 Media Assimilation and Fusion
1.3 Content-Based Multimedia Retrieval and Browsing
1.4 Multimedia Indexing
1.5 Multimedia Abstraction and Summarization
1.6 Semantic Analysis of Multimedia Data
1.7 Statistical Modeling of Multimedia Data
2. Multimedia Signal Processing and Communications
2.1 Media Representation and Algorithms
2.2 Audio, Image, Video Processing, Coding and Compression
2.3 Multimedia Database, Content Delivery and Transport
2.4 Multimedia Security and Content Protection
2.5 Wireless and Mobile Multimedia Networking
2.6 Multimedia Standards and Related Issues
3. Multimedia Applications and Services
3.1 Real-Time, Interactive Multimedia Applications
3.2 Ambiance Multimedia Applications
3.3 Multi-Modal Interaction
3.4 Virtual Environments
3.5 Personalization
3.6 Collaboration, Contextual Metadata, Collaborative Tagging
3.7 Web Applications
3.8 Multimedia Authoring
3.9 Multimedia-Enabled New Applications
(E-Learning, Entertainment, Health Care, Web2.0, SNS, etc.)
 
Paper Submission Guidelines
Papers should be no more than 10-12 pages in length, conforming
to the formatting instructions of Springer Verlag, LNCS series 
www.springer.com/lncs. Papers will be judged by an international 
program committee based on their originality, significance, 
correctness and clarity. All papers should be submitted 
electronically in PDF format at MMM2009 paper submission website: 
http://mmm2009.eurecom.fr
To publish the paper in the conference, one of the authors needs 
to register and present the paper in the conference.
Authors of selected papers will be invited to submit extended 
versions to "EURASIP Journal on Image and Video Processing" journal.
 
Important Dates
Submission of full papers: 6 Jul. 2008 (23:59 Central European Time (GMT+1))
Notification of acceptance: 15 Sep. 2008
Camera-ready Copy Due: 10 Oct. 2008
Author registration: 10 Oct. 2008
Conference: 7-9 Jan. 2009
 
General Chair
Benoit HUET, Institut EURECOM
 
Program Co-Chairs
Alan SMEATON, Dublin City University
Ketan MAYER-PATEL, UNC-Chapel Hill
Yannis AVRITHIS, National Technical University of Athens
 
Local Organizing Co-Chairs
Jean-Luc DUGELAY, Institut EURECOM
Bernard MERIALDO, Institut EURECOM
 
Demo Chair
Ana Cristina ANDRES DEL VALLE, Accenture Technology Labs
 
Finance Chair
Marc ANTONINI, University Nice Sophia-Antipolis
 
Publication Chairs
Thierry DECLERCK, DFKI GmbH
 
Publicity & Sponsorship Chair
Nick EVANS, Institut EURECOM
 
US Liaison
Ketan MAYER-PATEL, UNC-Chapel Hill
 
Asian Liaison
Liang Tien CHIA, National Technical University Singapore
 
European Liaison
Suzanne BOLL, University of Oldenburg
 
Steering Committee
Yi-Ping Phoebe CHEN, Deakin University , Australia
Tat-Seng CHUA, National University of Singapore, Singapore
Tosiyasu L. KUNII, Kanazawa Institute of Technology, Japan
Wei-Ying MA, Microsoft Research Asia, Beijing, China
Nadia MAGNENAT-THALMANN, University of Geneva, Switzerland
Patrick SENAC, ENSICA, France
 
 
In cooperation with Institut EURECOM and ACM SigM
Back to Top

9-20 . (2009-01-14) Biosignals (Porto-Portugal)

BIOSIGNALS will be held in Porto (Portugal) on January 14 - 17 2009.  Technically co-sponsored by the IEEE Engineering in Medicine and Biology Society (EMBS) and in cooperation with the Association for Computing Machinery (ACM SIGART) and the Association for the Advancement of Artificial Intelligence (AAAI), BIOSIGNALS brings together top researchers and practitioners in several areas of Biomedical Engineering, from multiple areas of knowledge, including biology, medicine, engineering and other physical sciences, interested in studying and using models and techniques inspired from or applied to biological systems. A diversity of signal types can be found in this area, including image, audio and other biological sources of information. The analysis and use of these signals is a multidisciplinary area including signal processing, pattern recognition and computational intelligence techniques, amongst others. The proceedings will be indexed by several major international indexers, including INSPEC and DBLP. Additionaly, a selection of the best papers of the conference will be published in a book, by Springer-Verlag. Best paper awards will be distributed during the conference. Further details can be found at the BIOSIGNALS conference web site (http://www.biosignals.org) This conference is co-located and part of the Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC www.biostec.org). Workshops and special sessions are also invited. If you wish to propose a workshop or a special session, for example based on the results of a specific research project, please contact the secretariat.

Marina Carvalho BIOSIGNALS Secretariat Av. D.Manuel I, 27A 2ºesq. 2910-595 Setúbal, Portugal Tel.: +351 265 520 185 Fax: +44 203 014 5436 Email: secretariat@biosignals.org    Web site: http://www.biosignals.org

IMPORTANT DATES: Regular Paper Submission (EXTENDED): July 21, 2008 Authors Notification: October 9, 2008 Final Paper Submission and Registration: October 23, 2008 in cooperation with: ACM SIGART and AAAI technically co-sponsored: IEEE EMB proceedings indexed by INSPEC and DBLP best papers published by Springer-Verlag 

CONFERENCE TOPIS: - Medical Signal Acquisition, Analysis and Processing - Wearable Sensors and Systems - Real-time Systems - Biometrics - Pattern Recognition - Computational Intelligence - Evolutionary Systems - Neural Networks - Speech Recognition - Acoustic Signal Processing - Time and Frequency Response - Wavelet Transform - Medical Image Detection, Acquisition, Analysis and Processing - Physiological Processes and Bio-signal Modeling, Non-linear dynamics - Bioinformatics - Cybernetics and User Interface Technologies - Electromagnetic fields in biology and medicin

KEYNOTE SPEAKERS: - Edward H. Shortliffe, Arizona State University, United States - Vimla L. Patel, Arizona State University, United States - Pier Luigi Emiliani, Institute of Applied Physics "Nello Carrara" (IFAC) of the Italian National Research Council (CNR), Italy - Maciej Ogorzalek, Jagiellonian University, Poland

WORKSHOP: (Regular Paper Submission: October 17, 2008) - Medical Image Analysis and Description for Diagnosis Systems - MIAD 2009 http://www.biostec.org/MIAD.htm

Back to Top

9-21 . (2009-04-02) CfP 3rd INT. CONF. ON LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS (LATA 2009)

Call for Papers  3rd INTERNATIONAL CONFERENCE ON LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS (LATA 2009)  Tarragona, Spain, April 2-8, 2009  http://grammars.grlmc.com/LATA2009/  *********************************************************************  AIMS:  LATA is a yearly conference in theoretical computer science and its applications. As linked to the International PhD School in Formal Languages and Applications that was developed at the host institute in the period 2002-2006, LATA 2009 will reserve significant room for young scholars at the beginning of their career. It will aim at attracting contributions from both classical theory fields and application areas (bioinformatics, systems biology, language technology, artificial intelligence, etc.).  SCOPE:  Topics of either theoretical or applied interest include, but are not limited to:  - algebraic language theory - algorithms on automata and words - automata and logic - automata for system analysis and programme verification - automata, concurrency and Petri nets - biomolecular nanotechnology - cellular automata - circuits and networks - combinatorics on words - computability - computational, descriptional, communication and parameterized complexity - data and image compression - decidability questions on words and languages - digital libraries - DNA and other models of bio-inspired computing - document engineering - extended automata - foundations of finite state technology - fuzzy and rough languages - grammars (Chomsky hierarchy, contextual, multidimensional, unification, categorial, etc.) - grammars and automata architectures - grammatical inference and algorithmic learning - graphs and graph transformation - language varieties and semigroups - language-based cryptography - language-theoretic foundations of natural language processing, artificial intelligence and artificial life - mathematical evolutionary genomics - parsing - patterns and codes - power series - quantum, chemical and optical computing - regulated rewriting - string and combinatorial issues in computational biology and bioinformatics - symbolic dynamics - symbolic neural networks - term rewriting - text algorithms - text retrieval, pattern matching and pattern recognition - transducers - trees, tree languages and tree machines - weighted machines  STRUCTURE:  LATA 2009 will consist of:  - 3 invited talks (to be announced in the second call for papers) - 2 invited tutorials (to be announced in the second call for papers) - refereed contributions - open sessions for discussion in specific subfields or on professional issues (if requested by the participants)  PROGRAMME COMMITTEE:  Parosh Abdulla (Uppsala) Stefania Bandini (Milano) Stephen Bloom (Hoboken) John Brzozowski (Waterloo) Maxime Crochemore (London) Juergen Dassow (Magdeburg) Michael Domaratzki (Winnipeg) Henning Fernau (Trier) Rusins Freivalds (Riga) Vesa Halava (Turku) Juraj Hromkovic (Zurich) Lucian Ilie (London, Canada) Kazuo Iwama (Kyoto) Aravind Joshi (Philadelphia) Juhani Karhumaki (Turku) Jarkko Kari (Turku) Claude Kirchner (Bordeaux) Maciej Koutny (Newcastle) Kamala Krithivasan (Chennai) Martin Kutrib (Giessen) Andrzej Lingas (Lund) Aldo de Luca (Napoli) Rupak Majumdar (Los Angeles) Carlos Martin-Vide (Tarragona & Brussels, chair) Joachim Niehren (Villeneuve d'Ascq) Antonio Restivo (Palermo) Joerg Rothe (Duesseldorf) Wojciech Rytter (Warsaw) Philippe Schnoebelen (Cachan) Thomas Schwentick (Dortmund) Helmut Seidl (Muenchen) Alan Selman (Buffalo) Jeffrey Shallit (Waterloo) Frank Stephan (Singapore)  ORGANIZING COMMITTEE:  Madalina Barbaiani Gemma Bel-Enguix Cristina Bibire Adrian-Horia Dediu Szilard-Zsolt Fazekas Alexander Krassovitskiy Guangwu Liu Carlos Martin-Vide (chair) Robert Mercas Catalin-Ionut Tirnauca Bianca Truthe Sherzod Turaev Florentina-Lilica Voicu  SUBMISSIONS:  Authors are invited to submit papers presenting original and unpublished research. Papers should not exceed 12 single-spaced pages and should be formatted according to the standard format for Springer Verlag's LNCS series (see http://www.springer.com/computer/lncs/lncs+authors?SGWID=0-40209-0-0-0). Submissions have to be uploaded at:  http://www.easychair.org/conferences/?conf=lata2009  PUBLICATION:  A volume of proceedings published by Springer in the LNCS series will be available by the time of the conference. A refereed volume of extended versions of selected papers will be published after it as a special issue of a major journal. (This was Information and Computation for LATA 2007 and LATA 2008.)  REGISTRATION:  The period for registration will be open since September 1, 2008 to April 2, 2009. The registration form can be found at the website of the conference: http://grammars.grlmc.com/LATA2009/  Early registration fees: 450 euros Early registration fees (PhD students): 225 euros Registration fees: 540 euros Registration fees (PhD students): 270 euros  At least one author per paper should register. Papers that do not have a registered author by December 31, 2008 will be excluded from the proceedings.  Fees comprise free access to all sessions, one copy of the proceedings volume, and coffee breaks. For the participation in the full-day excursion and conference lunch on Sunday April 5, the amount of 70 euros is to be added to the fees above: accompanying persons are welcome at the same rate.  PAYMENT:  Early registration fees must be paid by bank transfer before December 31, 2008 to the conference account at Open Bank (Plaza Manuel Gomez Moreno 2, 28020 Madrid, Spain): IBAN: ES1300730100510403506598 - Swift code: OPENESMMXXX (account holder: LATA 2009 – Carlos Martin-Vide).  (Non-early) registration fees can be paid either by bank transfer to the same account or in cash on site.  Besides paying the registration fees, it is required to fill in the registration form at the website of the conference. A receipt for the payment will be provided on site.  FUNDING:  Up to 20 grants covering partial-board accommodation will be available for nonlocal PhD students. To apply, candidates must e-mail their CV together with a copy of the document proving their present status as a PhD student.  IMPORTANT DATES:  Paper submission: October 22, 2008 Notification of paper acceptance or rejection: December 10, 2008 Application for funding (PhD students): December 15, 2008 Notification of funding acceptance or rejection: December 19, 2008 Final version of the paper for the proceedings: December 24, 2008 Early registration: December 31, 2008 Starting of the conference: April 2, 2009 Submission to the journal special issue: June 22, 2009  FURTHER INFORMATION:  carlos.martin@urv.cat  ADDRESS:  LATA 2009 Research Group on Mathematical Linguistics Rovira i Virgili University Plaza Imperial Tarraco, 1 43005 Tarragona, Spain Phone: +34-977-559543 Fax: +34-977-559597
Back to Top

9-22 . (2009-04-19) ICASSP 2009 Taipei, Taiwan

IEEE International Conference on Acoustics, Speech, and Signal Processing

http://icassp09.com

Sponsored by IEEE Signal Processing Society

April 19 - 24, 2009

Taipei International Convention Center

Taipei, Taiwan, R.O.C.

 

The 34th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) will be held at the Taipei International Convention Center in Taipei, Taiwan, April 19 - 24, 2009. The ICASSP meeting is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The conference will feature world-class speakers, tutorials, exhibits, and over 50 lecture and poster sessions on:

 

Audio and electroacoustics

 

Bio imaging and signal processing

 

Design and implementation of signal processing systems

 

Image and multidimensional signal processing

 

Industry technology tracks

 

Information forensics and security

 

Machine learning for signal processing

 

Multimedia signal processing

 

Sensor array and multichannel systems

 

Signal processing education

 

Signal processing for communications

 

Signal processing theory and methods

 

Speech and language processing

 

Taiwan: The Ideal Travel Destination. Taiwan, also referred to as Formosa – the Portuguese word for "graceful" – is situated on the western edge of the Pacific Ocean off the southeastern coast of mainland Asia, across the Taiwan Strait from Mainland China. To the north lie Okinawa and the main islands of Japan, and to the south is the Philippines. ICASSP 2009 will be held in Taipei, a city that blends traditional culture and cosmopolitan life. As the political, economic, educational, and recreational center of Taiwan, Taipei offers a dazzling array of cultural sights not seen elsewhere, including exquisite food from every corner of China and the world. You and your entire family will be able to fully experience and enjoy this unique city and island. Prepare yourself for the trip of your dreams, as Taiwan has it all: fantastic food, a beautiful ocean, stupendous mountains and lots of sunshine!

 

Submission of Papers: Prospective authors are invited to submit full-length, four-page papers, including figures and references, to the ICASSP Technical Committee. All ICASSP papers will be handled and reviewed electronically. The ICASSP 2009 website www.icassp09.com will provide you with further details. Please note that the submission dates for papers are strict deadlines.

 

Tutorial and Special Session Proposals: Tutorials will be held on April 19 and 20, 2009. Brief proposals should be submitted by August 4, 2008, to Tsuhan Chen at tutorials@icassp09.com and must include title, outline, contact information, biography and selected publications for the presenter, a description of the tutorial, and material to be distributed to participants. Special sessions proposals should be submitted by August 4, 2008, to Shih-Fu Chang at specialsessions@icassp09.com and must include a topical title, rationale, session outline, contact information, and a list of invited speakers. Tutorial and special session authors are referred to the ICASSP website for additional information regarding submissions.

 

Important Dates

Tutorial Proposals Due

August 4, 2008

Special Session Proposals Due

August 4, 2008

Notification of Special Session & Tutorial Acceptance

September 8, 2008

Submission of Regular Papers

September 29, 2008

Notification of Acceptance (by email)

December 15, 2008

Author’s Registration Deadline

February 2, 2009

 

 

 

Organizing Committee

 

 

General Chair

Lin-shan, Lee

National Taiwan University

 

General Vice-Chair

Iee-Ray Wei

Chunghwa Telecom Co.,Ltd.

 

Secretaries General

Tsungnan Lin

National Taiwan University

Fu-Hao Hsing

Chunghwa Telecom Co.,Ltd

 

Technical Program Chairs

Liang-Gee Chen

National Taiwan University

James R. Glass

Massachusetts Institute of Technology

 

Technical Program Members

Petar Djuric

Stony Brook University

Joern Ostermann

Leibniz University Hannover

Yoshinori Sagisaka

Waseda University

 

Plenary Sessions

Soo-Chang Pei (Chair)

National Taiwan University

Hermann Ney (Co-chair)

RWTH Aachen

 

Special Sessions

Shih-Fu Chang (Chair)

Columbia University

Lee Swindlehurst (Co-chair)

University of California, Irvine

 

Tutorial Chair

Tsuhan Chen

Carnegie Mellon University

 

Publications Chair

Homer Chen

National Taiwan University

 

Publicity Chair

Chin-Teng Lin

National Chiao Tung University

 

Finance Chair

Hsuan-Jung Su

National Taiwan University

 

Local Arrangements Chairs

Tzu-Han Huang

Chunghwa Telecom Co.,Ltd.

Chong-Yung Chi

National Tsing Hwa University

Jen-Tzung Chien

National Cheng Kung University

 

Conference Management

Conference Management Services

Back to Top