SCOOT: Phonology
Phonology
Tutorials:
Tutorials:
See Odette Scharenborg's Winter School presentation, particularly lecture 2.
Many speech researchers use software toolkits which implement popular algorithms, e.g. for defining and training an ASR or building a synthesiser. Toolkits often provide recipes for common tasks.
aims to 'provide straightforward access to the tools and techniques used by advanced researchers' by the use of virtual machines, which 'provide a consistent, end-to-end environment for experimentation, without the need to install other software or data, and cope with their incompatibilities and peculiarities.'
Linguistics is the scientific study of language and involves an analysis of language form, language meaning, and language in context.
Modern Speech technology relies on databases (or corpora) for training applications based on Machine Learning.
Corpus linguistics uses databases as a resource for language studies.
The European Language Resource Association (ELRA) is a non-profit organisation whose main mission is to make Language Resources (LRs) for Human Language Technologies (HLT) available to the community at large.
To achieve this goal, ELRA carries out a wide variety of activities around LRs, including Identification & Distribution, Production & Validation, Technology Evaluation, Information Dissemination on HLT.
The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories, based in the USA. LDC was formed in 1992 to address the critical data shortage then facing language technology research and development.
Corpora can be very expensive but many of the classic ones are free or relatively cheap, e.g. TIMIT, the Wall Street Journal Corpus, Resource Management.