I have a question about the "Query" field of the data. Was the same
set of queries
used in generating training and testing data?
Put another way, is there another set of queries that were used only
to find test
examples, but which we won't find in the training examples?
I guess this applies independently to each relation.
Thanks for your time,
Alicia
Sorry for the delayed reply. I was on vacation in Egypt with my family
for two weeks.
For each relation, the training data (140 sentences) and the testing
data (about 70 sentences) were created by randomly sampling from the
total data (about 210 sentences) for the given relation. This means
that most queries will appear in both the training data and the
testing data, but some queries might appear only in the training data
and other queries might appear only in the testing data.
Best wishes,
Peter.
Our group submitted our run a week ago (or so). Since the deadline for
the system description papers is approaching, we'd like to ask you about
th following.
Are there any requirements as to the submissions? (We checked the
Semeval web site but there seem to be no author instructions available).
And also, would it be possible to make the annotated test set available
soon so we can carry out an additional evaluation before April 17?
Thank you,
Willem & Sophia
Your paper should use the ACL style files:
http://ufal.mff.cuni.cz/acl2007/styles/
For Senseval-3, the page limit was 4 pages. I will ask the SemEval
organizers what the limit is for SemEval-1.
Your paper should include:
1. Introduction
2. System Description (knowledge sources, features used, learning
algorithms, pre-processing, post-processing, etc.)
3. Results (scores, training/testing time, analysis of limitations,
etc.)
4. Conclusion
The schedule is:
April 1 - evaluation ends
April 2 - answer files are given to task organizers
April 10 - task organizers give results to participants
April 17 - papers due
As task organizers, we will also write a paper, summarizing Task #4.
We will ask each participating team for a very brief description of
their system (basic architecture, knowledge sources, features; about
three sentences long), to include in our task summary paper.
I think we could release the annotated test set on April 2. I will ask
the SemEval organizers.
We will also let the participating teams know (at least) the best and
median scores for Task #4 by April 10.
Best wishes,
Peter.