Evaluation questions

54 views

Skip to first unread message

Chang Ming-Wei

unread,

Jan 28, 2014, 7:02:04 PM1/28/14

to micropo...@googlegroups.com

Hi,

Thanks for organizing this competition. I have a couple of questions and suggestions.

1) Segmentation

I found that in many cases the segmentation is ambiguous. For example

96427426464284673 "Swish bank UBS sa... ==> Swish bank UBS http://dbpedia.org/resource/Union_Bank_of_Switzerland

Why the segmentation is "Swish bank UBS" rather than "UBS"? In the same tweet, the annotator also annotate the "UBS" to the same entity. I feel that "UBS" is also the right answer.

There are a lot of other examples. Some of them are

92693524981616640 "the Queen" or "Queen"

93734032189304832 "The Gang of Six" or "Gang of Six"

Given that the segmentation is not really clear, maybe be we should not evaluate on the mentions? We can still take into account the ordering of the entities, and that will give us a very good outlook of the results already.

2) Mistakes/redirect on the DBPedia pages

I found that there exists some mistakes/ redirect on the annotation, which could create problems in the evaluation

For example,

http://dbpedia.org/resource/17_May should be http://dbpedia.org/resource/May_17

http://dbpedia.org/resource/New_mexico should be http://dbpedia.org/resource/New_Mexico

http://dbpedia.org/resource/JayZ should be http://dbpedia.org/resource/Jay-Z

http://dbpedia.org/resource/Associated_press should be http://dbpedia.org/resource/Associated_Press

There are other cases that I did not list.

These annotation will create problems in the evaluation and creating systems.

Thanks a lot!

Ming-Wei

MSM

unread,

Feb 4, 2014, 1:41:15 PM2/4/14

to micropo...@googlegroups.com, Chang Ming-Wei

Dear Ming-Wei,

Thanks for your comments. Please find our answers below.

On 29/01/2014 00:02, Chang Ming-Wei wrote:

Hi,

    Thanks for organizing this competition. I have a couple of questions and suggestions.

    1) Segmentation

             I found that in many cases the segmentation is ambiguous. For example

             96427426464284673 "Swish bank UBS sa... ==> Swish bank UBS http://dbpedia.org/resource/Union_Bank_of_Switzerland

We have not annotated overlapping entities. For e.g. we did not annotate both "Swish bank UBS" and "UBS" separately. We also always tried to annotate the longest possible entity mention found in a tweet, in this case "Swish bank UBS".

            Why the segmentation is "Swish bank UBS" rather than "UBS"? In the same tweet, the annotator also annotate the "UBS" to the same entity. I feel that "UBS" is also the right answer.

            There are a lot of other examples. Some of them are

            92693524981616640    "the Queen" or "Queen"



            93734032189304832 "The Gang of Six" or "Gang of Six"

We have removed the "the" prefix in the examples you mentioned ("the Queen" became "Queen"; and "The Gang of Six" became "Gang of Six"). If you find more such cases please send them to us.

            Given that the segmentation is not really clear, maybe be we should not evaluate on the mentions? We can still take into account the ordering of the entities, and that will give us a very good outlook of the results already.

    2) Mistakes/redirect on the DBPedia pages

           I found that there exists some mistakes/ redirect on the annotation, which could create problems in the evaluation

           For example,

          http://dbpedia.org/resource/17_May should be http://dbpedia.org/resource/May_17



          http://dbpedia.org/resource/New_mexico should be http://dbpedia.org/resource/New_Mexico

          http://dbpedia.org/resource/JayZ should be http://dbpedia.org/resource/Jay-Z

          http://dbpedia.org/resource/Associated_press should be http://dbpedia.org/resource/Associated_Press

          There are other cases that I did not list.

If you find more such cases please send them to us.

          These annotation will create problems in the evaluation and creating systems.

     Thanks a lot!

Ming-Wei

--
You received this message because you are subscribed to the Google Groups "microposts2014" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microposts201...@googlegroups.com.
Visit this group at http://groups.google.com/group/microposts2014.
For more options, visit https://groups.google.com/groups/opt_out.

Many thanks,
#Microposts2014 Challenge crew

Mena Badieh Habib Morgan

unread,

Feb 6, 2014, 9:44:04 AM2/6/14

to micropo...@googlegroups.com

There is still some named entities starts with the

93768212558249984:The US
92209240676110336:the Open Championship
92744177418379265:THE BAHAMAS
93412082602622976:The Louise boat
92700486129553408:the apprentice
93020527530221568:the lion king
96992650850344961:the Internet

Mena

MSM

unread,

Feb 18, 2014, 4:10:33 PM2/18/14

to micropo...@googlegroups.com, Mena Badieh Habib Morgan

Dear Mena,

Thanks for pointing out this, we addressed these cases in v1.6. We kept "the" inside an entity mention if the entity URI also contains it.
For e.g. the URI for "the apprentice" is http://dbpedia.org/resource/The_Apprentice_(UK_TV_series), and for "the lion king" is http://dbpedia.org/resource/The_Lion_King.

Thanks very much.
Best regards,
#Microposts2014 NEEL Challenge crew