Discussion on BSNLP/SIGSLAV Activities, 10 Sep 2015 — Notes
A discussion on BSNLP/SIGSLAV activities was held on September the 10th, 2015, at the 5th Workshop on Balto-Slavic Natural Language Processing (BSNLP 2015), organized as part of the 10th International Conference on Recent Advances in Natural Language Processing (RANLP 2015) in Hissar, Bulgaria. Participants in the discussion were the participants of the BSNLP 2015 workshop, most of whom are also members of SIGSLAV. The discussion was moderated by Roman Yangarber, the vice-chair of SIGSLAV.
Bellow are notes from the discussion, organized by the main points discussed. Notes were compiled by Roman Yangarber and Jan Šnajder.
DISCLAIMER: The notes may be incomplete. There is a possibility that some items are incorrect or wrongly attributed. Participants of the discussion are kindly invited to amend the minutes and report corrections in the SIGSLAV on-line forum. Thank you for your understanding.
(1) SIGSLAV activities
* Kiril Simov mentioned CLARIN research support
* Tomas Krilavičius mentioned that CLARIN would like its resources to be used more, perhaps we can join efforts
* someone: we should publicize resources that are not 100% complete
* Josef Steiberger: We should consider applying for a networking project.
* We discussed whether SIGSLAV should invest efforts into maintaining resources
* Tanja Samardžić thinks we shouldn't
* Jan Šnajder prefers to invest effort in networking
* Josef Steinberger thinks a discussion forum is enough
(2) Making BSNLP a premiere event
* Tanja Samadržić suggested to enforce multilingual evaluation on Slavic languages to make the papers more attractive, e.g., make it mandatory that everyone should evaluate on at least five Slavic languages. This improves chances for acceptance in ACL
* Josef Steinberger noted that data is a problem
* Josef Steinberger (on BSNLP frequency): a 2-years cycle is optimal
* Hana Skoumalová: The event has to be listed in conference ranking lists to be attractive.
* Josef Steinberger: extended papers should be published in a journal
* someone: open access journal is preferred
* Natalia Loukachevitch remarked that a "workshop" (in contrast to a conference) cannot be a premiere event
* Points that everybody seems to agree on:
* we need high-quality reviews
* we might consider including author response period
* papers should be indexed in a database of conferences - authors need points
* we should consider publishing extended papers in a special issue of journal
* do we prefer open-access journals?
(3) BSNLP shared task
* Jan Šnajder presented the three task types:
* (1) "Cracking the language barrier"
* Tasks that aim to directly bridge the language gap between Slavic Languages: MT tasks between Slavic languages, bilingual lexica, comparable corpora etc.
* (2) Cross-lingual
* Tasks aiming at cross-lingual text analysis to get the maximum of the data.
* There seem to be two extremes:
* "Task-level alignment": For each language, processing is done separately and by and large independently. The alignment is done at the very end.
* "Language-level alignment": Alignment is done at earlier stages, at the level of text analysis (e.g., by using SMT, bilingual lexica etc.)
* (3) Multilingual
* This is what we usually see in shared tasks: there is one task and teams develop systems for various languages. Separate datasets exists and the systems are tested separately. The main idea here is that, if someone comes up with a nice model for solving a task, it makes sense to try out the model on other, related languages. At any rate, the benefit of a shared task is preserved: all systems for one language will be tested on the same dataset and under identical conditions.
* Nikola Ljubešić proposed two concrete ideas:
* (a) cross-lingual parsing using Universal Dependencies (data for this task is available)
* (b) multilingual: morphosyntactic tagging for non-standard language (training data: news)
* Kiril Simov and Peya Osenova proposed word-sense annotation for different languages (could be with WordNet, named entities with Uraliex — for question answering)
* Natalia Loukachevitch remarked that novel senses might be a problem here, Kiril Simov noted that sense discovery can also be one of the tasks
* We need also to have a more complex task for groups that have the necessary resources
* Josef Steinberger proposed multilingual sentiment analysis
* this is an already existing task (currently only Czech is included), we might consider joining in with new languages
* this is, in fact, the fourth option for a shared task: to join efforts with someone else who has already organized a shared task
* Jan Šnajder mentioned cross-lingual SA, but getting the data is a problem
* Josef Steiberger noted that copyright issue can be resolved with using IDs instead of full text (as with Twitter data)
* Natalia Loukachevitch mentioned sentiment analysis systems for Russian (Russia has 10 participants for sentiment analysis in Russian, on restaurant reviews and Twitter data)
(4) Conclusions
* Discussion to continue live in real-time on the Google groups forum
* Notes / ideas from discussion:
* to be collected
* to be posted to Google groups and SIGSLAV membership mailing list