CFP: Challenges in the Management of Large Corpora + Big Data and Natural Language Processing

7 views

Skip to first unread message

Rayson, Paul

unread,

Feb 28, 2017, 4:33:36 AM2/28/17

to ml-...@googlegroups.com

Challenges in the Management of Large Corpora + Big Data and Natural Language Processing

A joint meeting of the workshops on "Big Data and Natural Language Processing" and "Challenges in the Management of Large Corpora" will take place on the 24^th of July, in Birmingham, as part of the Corpus Linguistics 2017 conference.

The workshop description, the lists of members of the Programme and Organizing committees, as well as the most current information, are accessible at the workshop home page:

http://corpora.ids-mannheim.de/cmlc-2017.html

Topics of interest

This year’s event focuses on the union of the standard topics of CLMC and Big NLP:

Technical issues
- Storage and retrieval solutions for big textual data corpora: primary data, metadata, and annotation data
- Scalable and efficient NLP tooling for annotating and analysing large datasets: distributed and GPGPU computing; using big data analysis frameworks (Hadoop, Spark, etc.) for language processing
- Dealing with streaming data (e.g. Social Media) and rapidly changing corpora
Licensing, legal and privacy issues:
- Licensing models of open and closed data
- Coping with intellectual property restrictions
Linguistic content issues:
- Dealing with the variety of language: multilinguality, historical texts, user-generated content, etc.
- Integration of human computation (crowdsourcing) and automatic annotation
- Quality management of annotations
Exploitation issues:
- Query languages
- Innovative approaches for aggregation and visualisation of text analytics

Submission categories

We invite anonymised extended abstracts for oral presentations on the topics listed above (PDF, 1000-1500 words excluding references, font preferably 11 pt, line spacing 1.5).

CMLC has always reserved a track for national corpus project reports, and to this end, we invite poster proposals of 500-750 words. National project reports need not be anonymised. The number of poster slots is limited. If there is spare capacity in the poster session, we reserve the right to change the presentation format of accepted papers from oral presentation to poster. Such a change will not affect how the paper is presented in the proceedings.

Submissions are accepted exclusively through the EasyAbs submission system, at http://linguistlist.org/easyabs/cmlc+bignlp.

Please note that an open-access (CC BY-NC-ND) electronic volume of proceedings is planned.

Important dates

Submission deadline: 12th of March, midnight UTC
Notification of acceptance: 18th of April
Camera-ready papers due: 18th of June
Workshop date: 24 July 2016, afternoon session

Workshop home page: http://corpora.ids-mannheim.de/cmlc-2017.html

Dr. Paul Rayson

Director of UCREL and Reader in Natural Language Processing

School of Computing and Communications, InfoLab21, Lancaster University, Lancaster, LA1 4WA, UK.

Web: http://www.lancaster.ac.uk/staff/rayson/

Tel: +44 1524 510357 Fax: +44 1524 510492

Reply all

Reply to author

Forward

0 new messages