Inconsistencies in annotation process

30 views
Skip to first unread message

Ugo Scaiella

unread,
Feb 17, 2014, 2:26:12 PM2/17/14
to micropo...@googlegroups.com
Dear chairs,

We are still playing with test set (version 1.6), and we have found several patterns we would like you to kindly clarify.

1) It is not clear the rationale for annotating dates. We have collected a number of instances that show different behaviors:

92866765834567680 "In 2010, Coca-Cola was voted  the most discriminatory employer ..." no annotation for spot '2010'
99852692247166976 "Auberge Resorts Announces It Is Again Honored On 2011 Travel + Leisure World ..." spot '2011' has been annotated

101107456050073601 "On September 19, 2011 the US citizens are gonna ..." 'September 19' and '2011' got annotated
92313714224668672 "... of the White House on July 16, 2011. http://twitpic.com/5r7jzo" spot 'July 16, 2011' is annotated with http://dbpedia.org/resource/July_16

91732691678007296 ""Did you know that World Ocean Day is on the 8th of June? Let's ..." spot '8th of June' is not annotated.

There are several instances of these cases, so I think it would be better if you could please clarify what is the guidelines for annotating dates.

2) 101022789389131776 "Met today w/ reps of 8 million Egyptians w/ disabilities. " 'Egyptians' is not annotated while in 96116890295992320 and 103170022179999745 'Lybian' is annotated with http://dbpedia.org/resource/Libyan_people

What's the correct behavior?

3) Finally, based on past threads, my understanding was that all occurrences of a spot should be annotated. Could you please confirm this?
Actually, we have found that there are several tweets where this doesn't hold:

101389292311556096 "Slaying of 3 Muslims lays bare divisions: With police nowhere to be seen, the Muslims of  ..." only one occurrency of spot 'muslim' has been annotated
92365972840775680 "Pedigree and genetics conference scheduled for Sept. 7-8 - Paulick Report: Pedigree and genetics conference sche... http://bit.ly/oOBsVN" only one occurrence of spot 'pedigree' has been annotated
92836172925120512 "I want to hear Justin's point of view on this non stop drama. Not Scooter's no offense but all this about Justin not him." only one occurrence of spot 'justin' has been annotated

I haven't systematically checked the dataset, but I think there could be more than these three instances in the dataset.


Regards,
-- Ugo Scaiella

Ugo Scaiella

unread,
Feb 18, 2014, 9:36:15 AM2/18/14
to micropo...@googlegroups.com
Dear chairs,

I don't know if we are still in time for highlighting additional issues we have seen in the training set. 
Namely, as you have stated that you will evaluate spot-entity pairs, it is important to understand what's the correct behavior for selecting spot boundaries.

For instance for people, the majority of spots do not include the role or the title of the person, but just his name. However, we have seen that this is not the case for the following ones:

tweetID: 91655593139511296
http://dbpedia.org/resource/Bill_Johnson_(Ohio_politician) should have spot "Bill Johnson" and not "Congressman Bill Johnson" (and perhaps "congressman" should be annotated with http://dbpedia.org/resource/United_States_House_of_Representatives )

tweetID: 92313714224668672
http://dbpedia.org/resource/Barack_Obama should have spot "Barack Obama" and not "President Barack Obama"

tweetID: 91957731992420352
http://dbpedia.org/resource/Sasha_Vuja%C4%8Di%C4%87 should have spot "Sasha Vujacic" and not "NBA Guard Sasha Vujacic" (and perhaps "NBA" should be annotated with as http://dbpedia.org/resource/National_Basketball_Association )

Finally, there are some other cases, where it looks like the spot has not been properly selected:

tweetID: 92045133754793984
http://dbpedia.org/resource/Android_(operating_system) should have spot "Android Honeycomb" and not "Android Honeycomb Tablet"

tweetID: 92991486244814849
http://dbpedia.org/resource/Denial-of-service_attack should have spot "DDos attack" and not "DDos"

tweetID: 92979691803262976
http://dbpedia.org/resource/Call_of_Duty:_Black_Ops has spot "call of duty ; black ops." the ending "." should be omitted from the spot

tweetID: 93031684374667264
why does the spot of http://dbpedia.org/resource/The_Times-Picayune contain "NOLA.com"? Shouldn't it be just Times-Picayune?

tweetID: 93071762639699968
same as before, http://dbpedia.org/resource/The_Sun_(United_Kingdom) has spot "The Sun homepage www.thesun.co.uk", shouldn't it be just "The Sun"?

tweetID: 93188706533515264

tweetID: 93246093017620480
http://dbpedia.org/resource/Space_Shuttle_Atlantis has spot "US space shuttle Atlantis" it should probably be simply "space shuttle Atlantis"


Thanks for your effort in managing this challenge.

Regards,
-- Ugo Scaiella
Reply all
Reply to author
Forward
0 new messages