Here are some very interesting followup thoughts from Laura Dietz.
------------------------
My student participated in SemEval this year -- however traditionally
SemEval is not my community. I did however participate in similar
evaluations, CLEF, TAC KBP, and I am organizing a task at TREC.
At TREC and TAC, the leaderboard is only revealed at the workshop. TREC
organizers purposefully decided to not have a live leaderboard.
Participating teams are required to submit a workshop paper (no page
limit) before they know their rank. This has a nice side effect that you
get more system descriptions and a deeper analysis on the performance of
the system --- not in comparison to the leaderboard.
Regarding anonymous teams: It is more likely that these are individual
grad students that were messing with the data, but were too shy to raise
their hand. My own student nearly did not submit a paper, unless I
strongly encouraged him. Sadly he could not travel by himself, but was
represented by another student in my lab. I try to teach them the
importance of **community** in research community, but its sometimes
difficult to get students to jump.
At the last TREC workshop, I had one participant who was at the far end
of the leaderboard. It took some convincing from my side and
confirmation that we want to hear about all participating systems, not
just the top performers. Another anecdote is about another team, who
were last the previous year, then mid-range last year, but who had the
right approach, but ruined their performance by some "stupid" mistakes.
At the workshop we helped the team "debug" their system (wrong tokenizer
& only binary predictions --- for a ranking task). It turns out their
approach can outperform the best team by 200% (!!!)
I explain my participants that the shared task is to figure out together
what works and what doesn't. We can always learn from a system, no
matter if its a high or low performer. Sometimes it requires to combine
a set of ideas to really make progress in a domain.
Cheers,
Laura
---
Ted Pedersen
http://www.d.umn.edu/~tpederse
On Sun, Jun 16, 2019 at 9:50 AM Ted Pedersen <
tped...@d.umn.edu> wrote:
>
> Greetings all,
>
> I posted this to various SemEval lists and Twitter, but was also
> encouraged to send it here (to Corpora). Apologies if you've seen this
> before!
>
> -----------------
>
> The SemEval workshop took place during the last two days of NAACL 2019
> in Minneapolis, and included quite a bit of discussion both days about
> the future of SemEval. I enjoyed this conversation (and participated
> in it), so wanted to try and share some of what I think was said.
>
> A few general concerns were raised about SemEval - one of them is that
> many teams participate without then going on to submit papers
> describing their systems. Related to this is that there are also
> participants who never even really identify themselves to the task
> organizers, and in effect remain anonymous throughout the event. In
> both cases the problem is that in the end SemEval aspires to be an
> academic event where participants describe what they have done in a
> form that can be easily shared with other participants (and papers are
> a good way to do that).
>
> My own informal estimate is that maybe a half of participating teams
> submit a paper, and then half of those go on to attend the workshop
> and present a poster. So if you see a task with 20 teams, perhaps 10
> of them submit a paper and maybe 5 present a poster. SemEval is
> totally ok with teams that submit a paper but do not attend the
> workshop to present a poster. That has long been the case, and this
> was confirmed again in Minneapolis. The goal then is to get more
> participating teams to submit papers. There was considerable
> discussion on the related issues of why don't more teams submit
> papers, and how can we encourage (or require) the submission of more
> papers?
>
> One point made is that SemEval participants are sometimes new to our
> community and so don't have a clear idea of what a "system description
> paper" should consist of, and so might not submit papers because they
> believe it will be too difficult or time consuming, or they just don't
> know what to do and fear immediate rejection. There was considerable
> support for the idea of providing a paper template that would help new
> authors know what is expected.
>
> It was also observed that when teams have disappointing results (not
> top ranked) they might feel like a paper isn't really necessary or
> might even be a bad idea. This tied into a larger discussion about the
> reality that some (many?) participants in SemEval tasks focus on their
> overall ranking and less on understanding the problem that they are
> working on. There was discussion at various points about how to get
> away from the obsession with the leaderboard, and to focus more on
> understanding the problem that is being presented by the task. A
> carefully done analysis of a system that doesn't perform terrifically
> well can shed important light on a problem, while simply describing a
> model and hyperparameter settings that might lead to high scores may
> not be too useful in understanding that same problem.
>
> One idea was for each task to award a "best analysis paper" and
> potentially award the authors of that paper an oral presentation
> during the workshop. Typically nearly all presentations at SemEval are
> posters, and so the oral slots are somewhat coveted and are often (but
> not always) awarded to the team with the highest rank. Shifting the
> focus of prizes and presentations away from the leaderboard might tend
> to encourage more participants to carry out such analysis and submit
> papers.
>
> That said, a carefully done analysis paper can be fairly time
> consuming to create and may require more pages than the typical 4 page
> limit. It was suggested that we be more flexible with page limits, so
> that teams could submit fairly minimal descriptions, or go into more
> depth on their systems and analysis. A related idea was to allow
> analysis papers to be submitted to the SemEval year X+1 workshop based
> on system participation in year X. This might be a good option to
> provide since SemEval timelines tend to be pretty tight as it stands.
>
> Papers sometimes tend to focus more on the horse race or bake off (and
> so analysis is limited to reporting a rank or score in the task).
> However, if scores or rankings were not released until after papers
> were submitted then this could certainly change the nature of such
> papers. In addition, a submitted paper could be made a requirement for
> appearing on the leaderboard.
>
> There is of course a trade off between increasing participation and
> increasing the number of papers submitted. If papers are made into
> requirements then some teams won't participate. There is perhaps a
> larger question for SemEval to consider, and that is how to increase
> the number of papers without driving away too many participants.
>
> Another observation that was made was that some teams never identify
> themselves and so participate in the task but are never really
> involved beyond being on the leaderboard. These could of course be
> shadow accounts created by teams who are already participating (to get
> past submission limits?), or they could be accounts created by teams
> who may only want to identify themselves if they end up ranking
> highly. Should anonymous teams be allowed to participate? I don't know
> that there was a clear answer to that question. While anonymous
> participation could be a means to game the system in some way, it
> might also be something done by those who are participating contrary
> to the wishes of an advisor or employer, If teams are reluctant to
> identify themselves for fear of being associated with a "bad" score,
> perhaps it could be possible for teams to remove scores from the
> leaderboard.
>
> To summarize, I got the sense that there is some interest in both
> increasing the number of papers submitted to SemEval, and also in
> making it clear that there is more to the event than the leaderboard.
> I think there were some great ideas discussed, and I fear I have done
> a somewhat imperfect job of trying to convey those here, but I don't
> want to let the perfect be the enemy of the good enough, so I'm going
> to go ahead and send this around and hope that others who have ideas
> will join in the conversation in some way.
>
> Cordially,
> Ted
>
> PS Emily Bender pointed out the following paper overlaps with some of
> the issues mentioned in my summary. I'd strongly encourage all SemEval
> organizers and participants to read through this, very much on target
> and presents some nice ideas about how to think about shared tasks.
>
>
https://aclweb.org/anthology/papers/W/W17/W17-1608/
>
> ---
> Ted Pedersen
>
http://www.d.umn.edu/~tpederse