SemEval discussions at NAACL 2019

Skip to first unread message

Ted Pedersen

Jun 15, 2019, 11:58:17 AM6/15/19
Greetings all,

The SemEval workshop took place during the last two days of NAACL 2019
in Minneapolis, and included quite a bit of discussion both days about
the future of SemEval. I enjoyed this conversation (and participated
in it), so wanted to try and share some of what I think was said.

A few general concerns were raised about SemEval - one of them is that
many teams participate without then going on to submit papers
describing their systems. Related to this is that there are also
participants who never even really identify themselves to the task
organizers, and in effect remain anonymous throughout the event. In
both cases the problem is that in the end SemEval aspires to be an
academic event where participants describe what they have done in a
form that can be easily shared with other participants (and papers are
a good way to do that).

My own informal estimate is that maybe a half of participating teams
submit a paper, and then half of those go on to attend the workshop
and present a poster. So if you see a task with 20 teams, perhaps 10
of them submit a paper and maybe 5 present a poster. SemEval is
totally ok with teams that submit a paper but do not attend the
workshop to present a poster. That has long been the case, and this
was confirmed again in Minneapolis. The goal then is to get more
participating teams to submit papers. There was considerable
discussion on the related issues of why don't more teams submit
papers, and how can we encourage (or require) the submission of more

One point made is that SemEval participants are sometimes new to our
community and so don't have a clear idea of what a "system description
paper" should consist of, and so might not submit papers because they
believe it will be too difficult or time consuming, or they just don't
know what to do and fear immediate rejection. There was considerable
support for the idea of providing a paper template that would help new
authors know what is expected.

It was also observed that when teams have disappointing results (not
top ranked) they might feel like a paper isn't really necessary or
might even be a bad idea. This tied into a larger discussion about the
reality that some (many?) participants in SemEval tasks focus on their
overall ranking and less on understanding the problem that they are
working on. There was discussion at various points about how to get
away from the obsession with the leaderboard, and to focus more on
understanding the problem that is being presented by the task. A
carefully done analysis of a system that doesn't perform terrifically
well can shed important light on a problem, while simply describing a
model and hyperparameter settings that might lead to high scores may
not be too useful in understanding that same problem.

One idea was for each task to award a "best analysis paper" and
potentially award the authors of that paper an oral presentation
during the workshop. Typically nearly all presentations at SemEval are
posters, and so the oral slots are somewhat coveted and are often (but
not always) awarded to the team with the highest rank. Shifting the
focus of prizes and presentations away from the leaderboard might tend
to encourage more participants to carry out such analysis and submit

That said, a carefully done analysis paper can be fairly time
consuming to create and may require more pages than the typical 4 page
limit. It was suggested that we be more flexible with page limits, so
that teams could submit fairly minimal descriptions, or go into more
depth on their systems and analysis. A related idea was to allow
analysis papers to be submitted to the SemEval year X+1 workshop based
on system participation in year X. This might be a good option to
provide since SemEval timelines tend to be pretty tight as it stands.

Papers sometimes tend to focus more on the horse race or bake off (and
so analysis is limited to reporting a rank or score in the task).
However, if scores or rankings were not released until after papers
were submitted then this could certainly change the nature of such
papers. In addition, a submitted paper could be made a requirement for
appearing on the leaderboard.

There is of course a trade off between increasing participation and
increasing the number of papers submitted. If papers are made into
requirements then some teams won't participate. There is perhaps a
larger question for SemEval to consider, and that is how to increase
the number of papers without driving away too many participants.

Another observation that was made was that some teams never identify
themselves and so participate in the task but are never really
involved beyond being on the leaderboard. These could of course be
shadow accounts created by teams who are already participating (to get
past submission limits?), or they could be accounts created by teams
who may only want to identify themselves if they end up ranking
highly. Should anonymous teams be allowed to participate? I don't know
that there was a clear answer to that question. While anonymous
participation could be a means to game the system in some way, it
might also be something done by those who are participating contrary
to the wishes of an advisor or employer, If teams are reluctant to
identify themselves for fear of being associated with a "bad" score,
perhaps it could be possible for teams to remove scores from the

To summarize, I got the sense that there is some interest in both
increasing the number of papers submitted to SemEval, and also in
making it clear that there is more to the event than the leaderboard.
I think there were some great ideas discussed, and I fear I have done
a somewhat imperfect job of trying to convey those here, but I don't
want to let the perfect be the enemy of the good enough, so I'm going
to go ahead and send this around and hope that others who have ideas
will join in the conversation in some way.

Ted Pedersen

Ted Pedersen

Jun 15, 2019, 2:04:23 PM6/15/19
Hi again,

BTW Emily Bender pointed out the following paper overlaps with some of
the issues mentioned in my summary. I'd strongly encourage all SemEval
organizers and participants to read through this, very much on target
and presents some nice ideas about how to think about shared tasks.

title = "Ethical Considerations in {NLP} Shared Tasks",
author = "Parra Escart{\'\i}n, Carla and
Reijers, Wessel and
Lynn, Teresa and
Moorkens, Joss and
Way, Andy and
Liu, Chao-Hong",
booktitle = "Proceedings of the First {ACL} Workshop on Ethics in
Natural Language Processing",
month = apr,
year = "2017",
address = "Valencia, Spain",
publisher = "Association for Computational Linguistics",
url = "",
doi = "10.18653/v1/W17-1608",
pages = "66--73",
abstract = "Shared tasks are increasingly common in our field, and
new challenges are suggested at almost every conference and workshop.
However, as this has become an established way of pushing research
forward, it is important to discuss how we researchers organise and
participate in shared tasks, and make that information available to
the community to allow further research improvements. In this paper,
we present a number of ethical issues along with other areas of
concern that are related to the competitive nature of shared tasks. As
such issues could potentially impact on research ethics in the Natural
Language Processing community, we also propose the development of a
framework for the organisation of and participation in shared tasks
that can help mitigate against these issues arising.",

Ted Pedersen

Reply all
Reply to author
0 new messages