Evaluation questions

52 views
Skip to first unread message

Chang Ming-Wei

unread,
Jan 28, 2014, 7:02:04 PM1/28/14
to micropo...@googlegroups.com
Hi,
 
    Thanks for organizing this competition. I have a couple of questions and suggestions.
 
    1) Segmentation
 
             I found that in many cases the segmentation is ambiguous.  For example
 
             96427426464284673 "Swish bank UBS sa...  ==>  Swish bank UBS http://dbpedia.org/resource/Union_Bank_of_Switzerland              
 
            Why the segmentation is "Swish bank UBS" rather than "UBS"? In the same tweet, the annotator also annotate the "UBS" to the same entity. I feel that "UBS" is also the right answer.
 
            There are a lot of other examples. Some of them are
 
            92693524981616640    "the Queen" or "Queen"
          
            93734032189304832 "The Gang of Six" or "Gang of Six"
 
            Given that the segmentation is not really clear, maybe be we should not evaluate on the mentions? We can still take into account the ordering of the entities, and that will give us a very good outlook of the results already.
 
 
    2) Mistakes/redirect on the DBPedia pages
 
           I found that there exists some mistakes/ redirect on the annotation, which could create problems in the evaluation
 
           For example,
 
      
 
 
 
          There are other cases that I did not list.
 
          These annotation will create problems in the evaluation and creating systems.
 
 
     Thanks a lot!
 
Ming-Wei
 
  

MSM

unread,
Feb 4, 2014, 1:41:15 PM2/4/14
to micropo...@googlegroups.com, Chang Ming-Wei
Dear Ming-Wei,

Thanks for your comments. Please find our answers below.


On 29/01/2014 00:02, Chang Ming-Wei wrote:
Hi,
 
    Thanks for organizing this competition. I have a couple of questions and suggestions.
 
    1) Segmentation
 
             I found that in many cases the segmentation is ambiguous.  For example
 
             96427426464284673 "Swish bank UBS sa...  ==>  Swish bank UBS http://dbpedia.org/resource/Union_Bank_of_Switzerland           
We have not annotated overlapping entities. For e.g. we did not annotate both "Swish bank UBS" and "UBS" separately. We also always tried to annotate the longest possible entity mention found in a tweet, in this case "Swish bank UBS".

            Why the segmentation is "Swish bank UBS" rather than "UBS"? In the same tweet, the annotator also annotate the "UBS" to the same entity. I feel that "UBS" is also the right answer.
 
            There are a lot of other examples. Some of them are
 
            92693524981616640    "the Queen" or "Queen"
          
            93734032189304832 "The Gang of Six" or "Gang of Six"
We have removed the "the" prefix in the examples you mentioned ("the Queen" became "Queen"; and "The Gang of Six" became "Gang of Six").  If you find more such cases please send them to us.

 
            Given that the segmentation is not really clear, maybe be we should not evaluate on the mentions? We can still take into account the ordering of the entities, and that will give us a very good outlook of the results already.
 
 
    2) Mistakes/redirect on the DBPedia pages
 
           I found that there exists some mistakes/ redirect on the annotation, which could create problems in the evaluation
 
           For example,
 
      
 
 
 
          There are other cases that I did not list.
If you find more such cases please send them to us.
          These annotation will create problems in the evaluation and creating systems.
 
 
     Thanks a lot!
 
Ming-Wei
 
  
--
You received this message because you are subscribed to the Google Groups "microposts2014" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microposts201...@googlegroups.com.
Visit this group at http://groups.google.com/group/microposts2014.
For more options, visit https://groups.google.com/groups/opt_out.
Many thanks,
#Microposts2014 Challenge crew

Mena Badieh Habib Morgan

unread,
Feb 6, 2014, 9:44:04 AM2/6/14
to micropo...@googlegroups.com
There is still some named entities starts with the

93768212558249984:The US
92209240676110336:the Open Championship
92744177418379265:THE BAHAMAS
93412082602622976:The Louise boat
92700486129553408:the apprentice
93020527530221568:the lion king
96992650850344961:the Internet

Mena

MSM

unread,
Feb 18, 2014, 4:10:33 PM2/18/14
to micropo...@googlegroups.com, Mena Badieh Habib Morgan
Dear Mena,

Thanks for pointing out this, we addressed these cases in v1.6. We kept "the" inside an entity mention if the entity URI also contains it.
For e.g. the URI for "the apprentice" is http://dbpedia.org/resource/The_Apprentice_(UK_TV_series), and for "the lion king" is http://dbpedia.org/resource/The_Lion_King.

Thanks very much.
Best regards,
#Microposts2014 NEEL Challenge crew
--
Reply all
Reply to author
Forward
0 new messages