Re: SemEval discussion at NAACL 2019

Skip to first unread message

Ted Pedersen

Jun 17, 2019, 6:57:43 AM6/17/19
to,,,,,, Laura Dietz
Here are some very interesting followup thoughts from Laura Dietz.


My student participated in SemEval this year -- however traditionally
SemEval is not my community. I did however participate in similar
evaluations, CLEF, TAC KBP, and I am organizing a task at TREC.

At TREC and TAC, the leaderboard is only revealed at the workshop. TREC
organizers purposefully decided to not have a live leaderboard.
Participating teams are required to submit a workshop paper (no page
limit) before they know their rank. This has a nice side effect that you
get more system descriptions and a deeper analysis on the performance of
the system --- not in comparison to the leaderboard.

Regarding anonymous teams: It is more likely that these are individual
grad students that were messing with the data, but were too shy to raise
their hand. My own student nearly did not submit a paper, unless I
strongly encouraged him. Sadly he could not travel by himself, but was
represented by another student in my lab. I try to teach them the
importance of **community** in research community, but its sometimes
difficult to get students to jump.

At the last TREC workshop, I had one participant who was at the far end
of the leaderboard. It took some convincing from my side and
confirmation that we want to hear about all participating systems, not
just the top performers. Another anecdote is about another team, who
were last the previous year, then mid-range last year, but who had the
right approach, but ruined their performance by some "stupid" mistakes.
At the workshop we helped the team "debug" their system (wrong tokenizer
& only binary predictions --- for a ranking task). It turns out their
approach can outperform the best team by 200% (!!!)

I explain my participants that the shared task is to figure out together
what works and what doesn't. We can always learn from a system, no
matter if its a high or low performer. Sometimes it requires to combine
a set of ideas to really make progress in a domain.


Ted Pedersen
On Sun, Jun 16, 2019 at 9:50 AM Ted Pedersen <> wrote:
> Greetings all,
> I posted this to various SemEval lists and Twitter, but was also
> encouraged to send it here (to Corpora). Apologies if you've seen this
> before!
> -----------------
> The SemEval workshop took place during the last two days of NAACL 2019
> in Minneapolis, and included quite a bit of discussion both days about
> the future of SemEval. I enjoyed this conversation (and participated
> in it), so wanted to try and share some of what I think was said.
> A few general concerns were raised about SemEval - one of them is that
> many teams participate without then going on to submit papers
> describing their systems. Related to this is that there are also
> participants who never even really identify themselves to the task
> organizers, and in effect remain anonymous throughout the event. In
> both cases the problem is that in the end SemEval aspires to be an
> academic event where participants describe what they have done in a
> form that can be easily shared with other participants (and papers are
> a good way to do that).
> My own informal estimate is that maybe a half of participating teams
> submit a paper, and then half of those go on to attend the workshop
> and present a poster. So if you see a task with 20 teams, perhaps 10
> of them submit a paper and maybe 5 present a poster. SemEval is
> totally ok with teams that submit a paper but do not attend the
> workshop to present a poster. That has long been the case, and this
> was confirmed again in Minneapolis. The goal then is to get more
> participating teams to submit papers. There was considerable
> discussion on the related issues of why don't more teams submit
> papers, and how can we encourage (or require) the submission of more
> papers?
> One point made is that SemEval participants are sometimes new to our
> community and so don't have a clear idea of what a "system description
> paper" should consist of, and so might not submit papers because they
> believe it will be too difficult or time consuming, or they just don't
> know what to do and fear immediate rejection. There was considerable
> support for the idea of providing a paper template that would help new
> authors know what is expected.
> It was also observed that when teams have disappointing results (not
> top ranked) they might feel like a paper isn't really necessary or
> might even be a bad idea. This tied into a larger discussion about the
> reality that some (many?) participants in SemEval tasks focus on their
> overall ranking and less on understanding the problem that they are
> working on. There was discussion at various points about how to get
> away from the obsession with the leaderboard, and to focus more on
> understanding the problem that is being presented by the task. A
> carefully done analysis of a system that doesn't perform terrifically
> well can shed important light on a problem, while simply describing a
> model and hyperparameter settings that might lead to high scores may
> not be too useful in understanding that same problem.
> One idea was for each task to award a "best analysis paper" and
> potentially award the authors of that paper an oral presentation
> during the workshop. Typically nearly all presentations at SemEval are
> posters, and so the oral slots are somewhat coveted and are often (but
> not always) awarded to the team with the highest rank. Shifting the
> focus of prizes and presentations away from the leaderboard might tend
> to encourage more participants to carry out such analysis and submit
> papers.
> That said, a carefully done analysis paper can be fairly time
> consuming to create and may require more pages than the typical 4 page
> limit. It was suggested that we be more flexible with page limits, so
> that teams could submit fairly minimal descriptions, or go into more
> depth on their systems and analysis. A related idea was to allow
> analysis papers to be submitted to the SemEval year X+1 workshop based
> on system participation in year X. This might be a good option to
> provide since SemEval timelines tend to be pretty tight as it stands.
> Papers sometimes tend to focus more on the horse race or bake off (and
> so analysis is limited to reporting a rank or score in the task).
> However, if scores or rankings were not released until after papers
> were submitted then this could certainly change the nature of such
> papers. In addition, a submitted paper could be made a requirement for
> appearing on the leaderboard.
> There is of course a trade off between increasing participation and
> increasing the number of papers submitted. If papers are made into
> requirements then some teams won't participate. There is perhaps a
> larger question for SemEval to consider, and that is how to increase
> the number of papers without driving away too many participants.
> Another observation that was made was that some teams never identify
> themselves and so participate in the task but are never really
> involved beyond being on the leaderboard. These could of course be
> shadow accounts created by teams who are already participating (to get
> past submission limits?), or they could be accounts created by teams
> who may only want to identify themselves if they end up ranking
> highly. Should anonymous teams be allowed to participate? I don't know
> that there was a clear answer to that question. While anonymous
> participation could be a means to game the system in some way, it
> might also be something done by those who are participating contrary
> to the wishes of an advisor or employer, If teams are reluctant to
> identify themselves for fear of being associated with a "bad" score,
> perhaps it could be possible for teams to remove scores from the
> leaderboard.
> To summarize, I got the sense that there is some interest in both
> increasing the number of papers submitted to SemEval, and also in
> making it clear that there is more to the event than the leaderboard.
> I think there were some great ideas discussed, and I fear I have done
> a somewhat imperfect job of trying to convey those here, but I don't
> want to let the perfect be the enemy of the good enough, so I'm going
> to go ahead and send this around and hope that others who have ideas
> will join in the conversation in some way.
> Cordially,
> Ted
> PS Emily Bender pointed out the following paper overlaps with some of
> the issues mentioned in my summary. I'd strongly encourage all SemEval
> organizers and participants to read through this, very much on target
> and presents some nice ideas about how to think about shared tasks.
> ---
> Ted Pedersen

Jelena Mitrovic

Jun 17, 2019, 7:27:40 AM6/17/19
to Ted Pedersen,,,,,,
Dear Ted,

Thank you for starting this discussion and reminding us all about the issues that were raised at SemEval.

This was my first-time participation in SemEval, and I did indeed think that only the top scoring systems should be described in papers. Ours was No 10 so I thought, OK, we barely made it :) I was also very surprised to hear that some of the top-scoring teams in some tasks did not write a paper at all, but I understand that some students do not wish to go into academia and see no point in learning how to write an academic paper.

This is certainly an important discussion and I do hope that others join in.

Best wishes,

UNSUBSCRIBE from this page:
Corpora mailing list

Dr. Jelena Mitrović
Postdoctoral Research Fellow
Fakultät für Informatik und Mathematik
Universität Passau / ITZ / Raum 114
Innstr. 43
94032 Passau

Ted Pedersen

Jun 17, 2019, 12:28:05 PM6/17/19
to Jelena Mitrovic,,
Hi Jelena,

Thanks for your feedback, and I think you make an important point. I
think this is also related to the fact that many participants to
SemEval are either new(ish) to our field or are perhaps from different
but related communities, and so the norms or expectations aren't
terribly clear. I think in some cases the papers might appear as a bit
of a surprise (really only discussed very much after results have been
submitted, etc...) so it might be that making it clear that we really
would like to be able to see a paper from everyone who participates
from the early days would help people plan ahead...with my own efforts
I tend to sketch in the system details of my SemEval papers as I'm
working on experiments (because I know how quickly I forget these
details) and for me at least getting started early tends to help
manage the paper deadlines (which come upon us pretty quickly).


Ted Pedersen

Jun 17, 2019, 12:36:04 PM6/17/19
to Diana Maynard,,,
Hi Diana,

Yes, I share this feeling. I find the 4 page limit very difficult to
work with, particularly if we are trying to provide enough details to
replicate our systems and also do some analysis (both of which seem
important). I'd much rather see a higher limit or perhaps no limit
(with maybe a 2 page minimum so that there is at least some content).
I don't see a big downside to this particularly since my experience of
reading other SemEval papers is that I'm sometimes a bit frustrated by
the lack of detail or examples or analysis (and yet the author has
already used 4 pages and can't do too much about that).

On Mon, Jun 17, 2019 at 8:00 AM Diana Maynard <> wrote:
> Hi Ted
> In my opinion, 4 pages is really not enough to write a decent system
> paper (my student submitted one, but had to leave out a lot of the
> interesting detail and discussion, which is a shame because that's where
> the real interest lies). Submitting a 4-page paper makes the whole thing
> barely worthwhile to put the effort in, and also makes it hard for
> others to understand the system properly and see what made in the end
> for a winning (or not) system and why. I'd be massively in favour of
> making this more flexible.
> Diana
> On 17/06/2019 11:57, Ted Pedersen wrote:
> >
> >> That said, a carefully done analysis paper can be fairly time
> >> consuming to create and may require more pages than the typical 4 page
> >> limit. It was suggested that we be more flexible with page limits, so
> >> that teams could submit fairly minimal descriptions, or go into more
> >> depth on their systems and analysis. A related idea was to allow
> >> analysis papers to be submitted to the SemEval year X+1 workshop based
> >> on system participation in year X. This might be a good option to
> >> provide since SemEval timelines tend to be pretty tight as it stands.

Ted Pedersen

Jun 17, 2019, 12:45:46 PM6/17/19
to Diana Maynard,,,
That's a fair point, and would allow for some continuity with
past/current practice.
On Mon, Jun 17, 2019 at 11:38 AM Diana Maynard
<> wrote:
> Right - I'd probably argue for a 4 page minimum though - you can really
> say nothing useful in 2 pages, and if you don't have enough material for
> 4 pages it's not really worth bothering.
> Diana

Bethard, Steven John - (bethard)

Jun 17, 2019, 12:46:05 PM6/17/19
to, Diana Maynard,,
Agreed that 4 pages is a bit constraining. I would worry a little that people who are unexperienced in writing (which is many of the SemEval participants) might ramble more than needed, but I think that could be addressed by providing a system paper template as suggested in previous comments. Maybe even include suggested (but not required or enforced) page lengths in each of the sections of the template?

You received this message because you are subscribed to the Google Groups "SemEval" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit

David Jurgens

Jun 17, 2019, 1:52:57 PM6/17/19
to, Diana Maynard,,
Hi All,

  Thanks to everyone for getting this discussion started!  We actually did raise the page limits a bit back in the SemEval-2016 CFP so that systems papers had " a recommended length of 4 pages of content, with a maximum length of 6 pages, should the authors need include addition analyses or descriptions" for the exact reasons everyone is stating here.  (Task papers had no limit that year!)  I don't think we ever checked whether most people took advantage of the extra space, but I do recall seeing multiple systems papers spilling over to the 5th page at least.
  I really like the earlier suggestion that we have a guideline for what a good systems paper looks like.  Like lots of folks, SemEval was actually my first entry point into the ACL community and I really had no idea how to write that paper (something I heard from another first-timer this year as well!), so some top-down guidance could go a long way here.  Perhaps it might be worth finding a few good examples of earlier papers too, though these would have been limited to 4 pages.  I'd be happy to work with other to draft this too.


Ekaterina Shutova

Jun 17, 2019, 3:50:08 PM6/17/19
to Diana Maynard, Bethard, Steven John - (bethard),,,
Dear all,

Indeed, thank you so much for starting this discussion, Ted!

I agree, both in terms of extending the page limit for system description papers (to e.g. 6 pages) and providing a template (which should not be obligatory to use for experienced researchers though).

I now always provide a paper template to my students when they write up their course projects -- and I noticed that the reports I get are of much higher quality with the template than without it. So it certainly works. Here is the one I used for my most recent course (scroll down to Report Instructions): . This is just an example, but I thought perhaps an adjusted version of it could provide a good starting point for SemEval. One other option would be for us to provide a generic template and then ask the task organisers to adjust it to their tasks and share with the participants.


On Mon, Jun 17, 2019 at 11:06 AM Diana Maynard <> wrote:
Agreed - the more we can help them produce something ultimately useful,
the better, and the easier for them to produce a paper too.

You received this message because you are subscribed to the Google Groups "SemEval Organizers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To post to this group, send email to
To view this discussion on the web visit

Ted Pedersen

Jun 17, 2019, 4:03:04 PM6/17/19
to, Diana Maynard,,
Thanks for the followups on page length issues David and Steve,

The page length was a little bit more constrained in the 2019 CFP,
where you could go up to six pages if you participated in multiple
subtasks or had a very similar system in a second task (which made it
a little unclear if you could just go ahead and submit a second paper
for that second system).

I was, to be honest, a little shy about going up to six pages in the
task where I had participated in multiple subtasks because my method
was fairly similar across the subtasks (and so I wasn't sure if I
"qualified" for additional pages. But, when I asked (I think) the task
organizers they encouraged me to take the six which I happily did.

I went back and re-read the 2016 statement and found that a bit easier
to interpret. Perhaps the page guidelines can be a bit more
straightforward, for example a minimum of 4 pages, a maximum of N,
where some guidance on how much space to give to what kind of details
could be included in a template.

It might also be useful to allow some number of pages in an appendix
for details that don't clearly fit into the narrative or take a lot of
space (for charts, detailed lists of parameter settings, regular
expressions used for pre-processing, code snippets, etc.) This might
in the end lead to a statement like "system description papers can be
from 4 to 10? pages in length (excluding references), with up to an
additional 10? pages of appendices". Not sure about the exact numbers
but a fairly simple statement with minimum and maximum page lengths
might help (with suitable guidance provided by a template).

Ted Pedersen
> To view this discussion on the web visit

Ted Pedersen

Jun 17, 2019, 4:17:27 PM6/17/19
to, Diana Maynard, Bethard, Steven John - (bethard),,, Richard Wicentowski,
Hi Katia,

Indeed, this looks very helpful! I can think of a few other folks who
I know have had students writing SemEval papers as a part of a class
and I'll ping them (via cc'ing them on this email - that's you Rich
and Julie :) to see if they have any guidelines or thoughts to share.
I don't *think* I've ever put together guidelines but will double
check the archives.

Ted Pedersen
> You received this message because you are subscribed to the Google Groups "SemEval" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> To view this discussion on the web visit

Richard Wicentowski

Jun 17, 2019, 4:21:43 PM6/17/19
to Ted Pedersen,, Diana Maynard, Bethard, Steven John - (bethard),,, Julie Medero
Hi Ted, Katia, all-

Julie and I “co-taught” NLP last semester. In our joint writeup for the final project ( or we also have a template for writing the paper which is similar to Katia’s. On a quick glance, it looks like Katia’s is more detailed, but folks might get some benefit from reading both.

Reply all
Reply to author
0 new messages