wnut17 - score revision

62 views
Skip to first unread message

Leon Derczynski

unread,
Jul 17, 2017, 7:34:44 PM7/17/17
to Workshop on Noisy User-generated Text (WNUT)
Hi,

We found a bug in our evaluation script & method, for surface forms, that disregards the type of the form. The script has now been modified to capture and process this correctly, with the scores coming down accordingly. Apologies for this being discovered post-evaluation instead of while systems were being tuned. The ranking remains roughly the same.

SYSTEM  F1 (ENTITY)     F1 (SURFACE)
arcada  39.01   36.83   
drexel_cci      26.30   25.26   
flytxt  37.80   35.65   
mic-cis 36.36   33.53   
sjtu_adapt      39.98   37.17   
spinningbytes   40.55   39.11   
uh_ritual       41.27   39.38

It's a very tough task! Paper reviews will be issued shortly, after which you can update your manuscripts for the camera ready deadline, which we'll make the end of this week.


All the best,


Leon

patrick.chri...@gmail.com

unread,
Jul 19, 2017, 5:28:20 AM7/19/17
to Workshop on Noisy User-generated Text (WNUT)
Hi,

I ran the new evaluation script on the submitted results, and the results you report here are inconsistent with what I get from the script. Running the new script the entity f1 score is the same as with the old script, and here it is lower. The reported surface f1 score is also lower than the one I receive when running the evaluation script.

Regards,
Patrick

Eric Nichols

unread,
Jul 19, 2017, 11:35:13 AM7/19/17
to Workshop on Noisy User-generated Text (WNUT), patrick.chri...@gmail.com
Greetings,

Would you mind sending me the scores you get with both versions and the input file so I can take a look?

All the best,

Eric Nichols

--
You received this message because you are subscribed to the Google Groups "Workshop on Noisy User-generated Text (WNUT)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wnut+uns...@googlegroups.com.
To post to this group, send email to wn...@googlegroups.com.
Visit this group at https://groups.google.com/group/wnut.
To view this discussion on the web visit https://groups.google.com/d/msgid/wnut/d9e484ab-b18c-4fa6-bcb0-949ed3333fb5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gustavo Aguilar

unread,
Jul 19, 2017, 1:21:08 PM7/19/17
to Workshop on Noisy User-generated Text (WNUT)
Hello,

I was running the new script over my results and found (based on the log of the script) that the emerging.test.annotated file has the following inconsistencies:

Line 7801: 12  I-product -> there are 2 spaces instead of a tab separating the token and the tag
Line 16784: , I-group -> There is no B-group for this I-group tag (should the ',' be an O?)
Line 17130: CNN b-corporation -> b-corporation should be B-corporation
Line 18829: Advertise I-creative-work -> There is no B-creative-work before this I-creative-work tag
Line 22435: mixes I-product O -> There is no B-product before the I-product, instead the previous tag is B-corporation

Now that results are being updated, it should be good to include those fixes in the  emerging.test.annotated file.

Regards,
Gustavo

Eric Nichols

unread,
Jul 19, 2017, 10:41:28 PM7/19/17
to Gustavo Aguilar, Workshop on Noisy User-generated Text (WNUT)
Greetings,

We have corrected several tagging inconsistencies in the test data.
The corrected version was used to produce the final system evaluation.
We should be able to release the updated test data soon, so please hold tight.

All the best,

Eric Nichols

--
You received this message because you are subscribed to the Google Groups "Workshop on Noisy User-generated Text (WNUT)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wnut+unsubscribe@googlegroups.com.

To post to this group, send email to wn...@googlegroups.com.
Visit this group at https://groups.google.com/group/wnut.

Leon Derczynski

unread,
Jul 21, 2017, 3:54:30 PM7/21/17
to Workshop on Noisy User-generated Text (WNUT)
Dear participants,

The test data on the site has now been updated to that used to produce the final rankings. Apologies for the very late notice. You may submit papers with revised results in matching these, though paper titles have now been set in the program, so changes there might not get picked up.

To be specific, the following files are different from the versions immediately after the eval period:

We apologize for this disruptive, late change. We hope that our published results are now consistent with what you can observe.

Please note, if you'd like to re-run evaluations to update the data in your papers, you can submit revised manuscripts with these changes up until end July 24.

The reason for this change is to provide those who come later with a simple, understandable picture of the shared task. Our other option was to freeze the data and issue another set of data and eval scripts after release. This would have meant two eval scripts, two datasets and so on, which makes future comparison harder. It is out hope that inconvenience and rushing now will make our papers and datasets more usable to those who come in the future, strengthening all of our contributions.

As an aside, we'll release the data, eval scripts and submissions (along with the original source docs and crowdsourcing responses) via a git repo, which will accept pull requests to update annotations and so on.

All the best,


Leon



On Wednesday, 19 July 2017 19:41:28 UTC-7, Eric Nichols wrote:
Greetings,

We have corrected several tagging inconsistencies in the test data.
The corrected version was used to produce the final system evaluation.
We should be able to release the updated test data soon, so please hold tight.

All the best,

Eric Nichols

patrick.chri...@gmail.com

unread,
Jul 21, 2017, 4:21:23 PM7/21/17
to Workshop on Noisy User-generated Text (WNUT), patrick.chri...@gmail.com
Hi,

I sent you an email regarding this and the results with the updated test data set, which are still inconsistent with the results in the rankings.

Regards,
Patrick

Leon Derczynski

unread,
Jul 21, 2017, 4:49:31 PM7/21/17
to Workshop on Noisy User-generated Text (WNUT), patrick.chri...@gmail.com
Thank you Patrick.

We've taken down the eval file for a few hours while we sync everything up - when it's available again, it will have been verified and there will be a new message on this list

Leon

Utpal Sikdar

unread,
Jul 23, 2017, 7:17:32 AM7/23/17
to Workshop on Noisy User-generated Text (WNUT)
Hi Leon,

 The given link for annotated test data does not contain any data and the updated evaluation script generates error (I also checked with previous eval script, it's running properly).

Thanks,
Utpal  

Leon Derczynski

unread,
Jul 23, 2017, 1:07:32 PM7/23/17
to Workshop on Noisy User-generated Text (WNUT)
Dear Utpal,

We'd taken down the file while checking for errors - it's back in place now. Sorry to hear you're having problems with the eval script. Can you show us the error and the command used to run it?

All the best,


Leon

Utpal Sikdar

unread,
Jul 24, 2017, 1:25:21 AM7/24/17
to Workshop on Noisy User-generated Text (WNUT)
Hi Leon,

Here is the error:

 main()
  File "wnuteval.py", line 541, in main
    tokens = doc_to_toks(lines)
  File "wnuteval.py", line 299, in doc_to_toks
    for src, nested in doc_to_tokses(lines).items()}
  File "wnuteval.py", line 273, in doc_to_tokses
    for src, toks in sent_to_toks(sent, sent_id).items():
  File "wnuteval.py", line 128, in sent_to_toks
    for src, tok in line_to_toks(line, sent_id, word_id).items():
  File "wnuteval.py", line 113, in line_to_toks
    raise ValueError('Invalid line: %s %d %d' % (line, sent_id, word_id))
ValueError: Invalid line: Tirith I-location I-group 468 12

Thanks,
Utpal

Utpal Sikdar

unread,
Jul 24, 2017, 1:37:20 AM7/24/17
to Workshop on Noisy User-generated Text (WNUT)
The command is:

cat testDataOutputSubmission.txt | awk '{print $NF}'| paste emerging.test.annotated - | python2.7 wnuteval.py

Eric Nichols

unread,
Jul 24, 2017, 1:40:28 AM7/24/17
to Utpal Sikdar, Workshop on Noisy User-generated Text (WNUT)
Greetings,

The line where the error occurs has a space after the I-location tag, making it invalid.
This implies you are not using the finalized evaluation data. Could you send me the output you are trying to score?

All the best,

Eric Nichols

--
You received this message because you are subscribed to the Google Groups "Workshop on Noisy User-generated Text (WNUT)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wnut+unsubscribe@googlegroups.com.
To post to this group, send email to wn...@googlegroups.com.
Visit this group at https://groups.google.com/group/wnut.

Utpal Sikdar

unread,
Jul 24, 2017, 2:12:44 AM7/24/17
to Workshop on Noisy User-generated Text (WNUT), utpal....@gmail.com
Hi Eric,

Now I took updated test data (I tried old test data because the given link for test file earlier didn't contain any data) and it's working fine.

Thanks,
Utpal
To unsubscribe from this group and stop receiving emails from it, send an email to wnut+uns...@googlegroups.com.

To post to this group, send email to wn...@googlegroups.com.
Visit this group at https://groups.google.com/group/wnut.

Eric Nichols

unread,
Jul 24, 2017, 2:46:56 AM7/24/17
to Utpal Sikdar, Workshop on Noisy User-generated Text (WNUT)
Reply all
Reply to author
Forward
0 new messages