PAN@FIRE-2013: CLiNSS Cross Language Indian News Story Search

28 views

Skip to first unread message

Parth Gupta

unread,

May 9, 2013, 10:34:53 AM5/9/13

to fire...@googlegroups.com, cl...@googlegroups.com, cli...@dsic.upv.es

Apologies for cross-posting

-------------------------------------------------------------------------------
Call for Participation
-------------------------------------------------------------------------------

PAN Track on
Cross-Language !ndian News Story Search

held in conjunction with the FIRE 2013 Forum for Information Retrieval Evaluation
4 - 6 December 2013, New Delhi, India
http://www.dsic.upv.es/grupos/nle/clinss.html

-------------------------------------------------------------------------------

This edition of CL!NSS focuses on journalistic text re-use as previous year. News agencies are a prolific source of text on the Web and a valuable source of text in multiple languages. News stories generated by different authors, whether independently or derived from another story, typically exist as separate entities and consequently there is a need to link them.

Linking news stories covering the same events written in different languages offers a number of benefits. For example, in a multilingual environment, such as India, where the same news story is covered in multiple languages, a reader might want to refer to the local language version of a news story. News stories covering the same event(s), published in different languages, may also be rich sources of both parallel and comparable text, for example, parallel fragments in the news story, e.g. direct quotes or translation equivalents; comparable fragments, e.g. paraphrases. Therefore identification of similar news stories written in multiple languages offers a valuable multilingual resource. In the case of Indian languages there exist limited language resources for NLP and IR tasks. For instance, identifying comparable and parallel documents on the web would offer a potential (and abundant) source for deriving bilingual dictionaries and training statistical MT systems (Munteanu & Marcu, 2005; Barker & Gaizauskas, 2012).

In this edition, the aim is to identify the same story written across languages (English and Hindi) - a problem of cross-language news story detection. The task will involve identifying and linking news stories covering the same event in Hindi for the given English language news story.

We invite researchers and practitioners from all fields to participate.

References
1. Dragos Munteanu and Daniel Marcu (2005). Improving Machine Translation Performance by Exploiting Comparable Corpora. Computational Linguistics, 31 (4), pp. 477-504, December
2. Emma Barker and Robert Gaizauskas (2012). Assessing the Comparability of News Texts. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12).

-------------------------------------------------------------------------------
Important Dates
-------------------------------------------------------------------------------
6 May, 2013    Release of training corpus (training period starts)
1 Sept, 2013    Release of test corpus
20 Sept, 2013    Submission of runs
1 Nov, 2013    Release of qrels (result notification)
15 Nov, 2013    Working notes due

-------------------------------------------------------------------------------
Task Coordinators
-------------------------------------------------------------------------------

Parth Gupta, Paolo Rosso
NLE Lab @ Universitat Politècnica de València, Spain

Paul Clough, Mark Stevenson
IR &NLP Groups @ University of Sheffield, UK

Rafael E. Banchs
HLT, Institute for Infocomm Research, Singapore

-------------------------------------------------------------------------------
Contact
-------------------------------------------------------------------------------

E-mail: cli...@dsic.upv.es
Track Web page: http://www.dsic.upv.es/grupos/nle/clinss.html

Reply all

Reply to author

Forward

0 new messages