Entities not matching Dbpedia taxonomy

123 views
Skip to first unread message

Daniel Dahlmeier

unread,
Jan 27, 2014, 5:05:48 AM1/27/14
to micropo...@googlegroups.com
Hi

some of the entities that are annotated in the training data  do not seem to match any category given in the documentation.

For example the zodiac sign "Aquarius":

91645803923382272       "#Aquarius  your greatest obstacle is your fear of rejectionâyou'd rather write down your feelings than express them face to face."     Aquarius        http://dbpedia.org/resource/Aquarius_(astrology)

The taxonomy for "Aquarius" in dbpedia is

Atrological Signs is not included in the taxonomy listed in the annotation guidelines (#Microposts2014 Challenge on Named Entity Extraction &
Linking (NEEL) Annotation Guidelines)

The taxonomy is included below for reference. 

Do I understand something wrong? Are entities restricted to the categories from  taxonomy given in the annotation guidelines or not ?


regards,
Daniel



Taxonomy
-----------------
Amount
Animal
   Bird
   Insect
Event
    MilitaryConflict
   PoliticalEvent
   SportEvent
   WeatherEvent
   MeetingEvent
Function
  Job
Location
   AdministrativeRegion
   Airport
   Bridge
   Canal
   City
    Continent
   Country
   Hospital
   Island 
   Museum
   Lake
   Lighthouse
   Mountain
   Park
   Restaurant
   River
   Road
   ShoppingMall
   Stadium
   Station
   Valley
Organization
  Airline
  Band
  Broadcast
  Company
  EducationalInstitution
  Legislature
  NonProfitOrganisation
  RadioStation
  SoccerClub
  SportsLeague
  SportsTeam
  TVStation
  University
  PoliticalOrganisation
Person
  Ambassador
  Architect
  Artist
  Astronaut
  Athlete
  Celebrity
  ComicsCharacter
  Criminal
  FictionalCharacter
  Mayor 
  MusicalArtist
  Politician
  SoccerPlayer
  TennisPlayer
Product
  Aircraft
  Album
  Automobile
  Book
  Drug
  EmailAddress
  Magazine
  Movie
  Newspaper
  OperatingSystem
  PhoneNumber
  ProgrammingLanguage
  RadioProgram
  SchoolNewspaper
  Software
  Song
  Spacecraft
  URL
  VideoGame
  Weapon
  Website
Time
  Holiday
Cardinal Direction
Language
Nationality
Numeric Expression
    Day of a month
Religion
Season
AstronomicalObject
   Planet 
   Natural Satellite
EthnicGroup
Weather
Sport Name

Fréderic Godin

unread,
Jan 27, 2014, 5:16:10 AM1/27/14
to micropo...@googlegroups.com
Hi,

For me, this was not clear either.
I find many amounts of money but they are never annotated.
I would assume that these are amounts.

Also, I currently assume that we only need to detect the subcategories given such as 'Airport' or 'Bridge' but not locations in general.
'Location' is even not part of the DBPedia ontology. In DBPedia 'Place' is used.

Thanks in advance for clarifying!

Best,

Fréderic


2014-01-27 Daniel Dahlmeier <ddahl...@googlemail.com>

--
You received this message because you are subscribed to the Google Groups "microposts2014" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microposts201...@googlegroups.com.
Visit this group at http://groups.google.com/group/microposts2014.
For more options, visit https://groups.google.com/groups/opt_out.

MSM

unread,
Jan 28, 2014, 11:00:04 AM1/28/14
to micropo...@googlegroups.com, Fréderic Godin
Hi Fréderic,

On 27/01/2014 10:16, Fréderic Godin wrote:
Hi,

For me, this was not clear either.
I find many amounts of money but they are never annotated.
I would assume that these are amounts.
We have only considered entities which can be mapped to DBpedia. Numeric expressions such as "4 million pounds" do not have a DBpedia URI.
Could you please provide some example numeric expressions which you think we should have been mapped to DBpedia?


Also, I currently assume that we only need to detect the subcategories given such as 'Airport' or 'Bridge' but not locations in general.
'Location' is even not part of the DBPedia ontology. In DBPedia 'Place' is used.
Similarly, could you please provide some example entities for these types?


Thanks in advance for clarifying!

Best,

Fréderic
Thanks very much,
#Microposts2014 Challenge crew

MSM

unread,
Jan 28, 2014, 11:02:22 AM1/28/14
to micropo...@googlegroups.com, Daniel Dahlmeier
Hi Daniel,

Thanks very much for your comment. We have added AstrologicalSign to our taxonomy.

Many thanks,
#Microposts2014 Challenge crew

Stefano Parmesan

unread,
Jan 29, 2014, 2:37:04 AM1/29/14
to micropo...@googlegroups.com
Hi everyone,

Let me try to make the issue clearer:

- First question that arises is: the microposts taxonomy has been built on top of what?
I personally would expect to match against http://mappings.dbpedia.org/server/ontology/classes/ (used for property rdf:type) but looking at the entries something seems out of place; just going through the taxonomy in order:
  Amount -> can't be found in the dbpedia ontology
    PoliticalEvent -> can't be found in the dbpedia ontology
    SportEvent -> can't be found in the dbpedia ontology
    WeatherEvent -> can't be found in the dbpedia ontology
    MeetingEvent -> can't be found in the dbpedia ontology
  Function -> can't be found in the dbpedia ontology
    Job -> can't be found in the dbpedia ontology
  Location -> can't be found in the dbpedia ontology

and so on. It seems to me this is the wrong ontology, so the question. I also checked the dbpedia categories (used with property dcterms:subject), but even there something is out of place (there is for example no http://dbpedia.org/resource/Category:Amount even though http://dbpedia.org/resource/Category:Animal is there).

- Second thing is: should we check for exact membership, or we should also consider the parent-categories?
I ask this question because (for example) in tweet 91921712177889280 we find the entity http://dbpedia.org/resource/God which is not directly in neither of the entities in the taxonomy (both with rdf:type and dcterms:subject) but if we check for the parents, we will eventually find http://dbpedia.org/resource/Category:Religion which is in the taxonomy; this means that we should check for all the subcategories as well, but if this is the case, what's the point of having both "Event" and all its children ("MilitaryConflict", "PoliticalEvent", ...) in the taxonomy?

Thanks,



2014-01-28 MSM <msm.o...@gmail.com>



--
Dott. Stefano Parmesan
Backend Web Developer and Data Lover ~ SpazioDati s.r.l.
Via del Brennero, 52 – 38122 Trento – Italy

Fréderic Godin

unread,
Jan 29, 2014, 2:43:24 AM1/29/14
to micropo...@googlegroups.com
Very nice explanation Stefano!
This is the problem I've been suffering from the whole week.

Best,

Fréderic



2014-01-29 Stefano Parmesan <parm...@spaziodati.eu>

Fréderic Godin

unread,
Feb 3, 2014, 9:53:53 AM2/3/14
to micropo...@googlegroups.com
Dear chairs,

Since it has been a week since Daniel asked the first question about the taxonomy, I was wondering if you were able to take a look at it?
I think many teams are still struggling. Or did someone already find an explanation and did I miss it?

Thanks in advance!

Best,

Fréderic

Op maandag 27 januari 2014 11:05:48 UTC+1 schreef Daniel Dahlmeier:

Stefano Parmesan

unread,
Feb 4, 2014, 3:34:48 AM2/4/14
to micropo...@googlegroups.com
We didn't, still waiting for an answer...

(it's like hearing the first verse of Comfortably Numb in your head over and over again)

Thanks and regards,


2014-02-03 Fréderic Godin <frederi...@ugent.be>:

--
You received this message because you are subscribed to the Google Groups "microposts2014" group.
To unsubscribe from this group and stop receiving emails from it, send an email to microposts201...@googlegroups.com.
Visit this group at http://groups.google.com/group/microposts2014.
For more options, visit https://groups.google.com/groups/opt_out.

#Microposts2014 Chairs

unread,
Feb 4, 2014, 4:27:47 AM2/4/14
to micropo...@googlegroups.com
Dear Stefano, Frederic,

> - First question that arises is: the microposts taxonomy has been built
> on top of what?

the taxonomy is derived from the NERD ontology
http://nerd.eurecom.fr/ontology/nerd-v0.5.n3 . A set of additional
classes from YAGO have been added.

> - Second thing is: should we check for exact membership, or we should
> also consider the parent-categories?

This is up to you and how your system is built.

For the evaluation process we *do not consider* any typing information,
but we will judge the goodness of your submission based on the exact
match of the pair (entity,uri).
Hence, you are free to use any ontology (for instance your own) that
better fits the model of the corpus.

Cheers,
#Microposts2014 Challenge crew

Mena Badieh Habib Morgan

unread,
Feb 7, 2014, 7:39:57 AM2/7/14
to micropo...@googlegroups.com
Hello,
The provided NERD taxonomy is not consistent with DBpedia ontology.
For example there is no equivalent for EmailAddress in the DBpedia ontology.
How could I develop my own methods of disambiguation while you are directing me to a specific ontology of a specific tool?
The task is to link entities to DBpedia, so DBpedia ontology is the only ontology that should be used. Right?

Thanks

Mena

#Microposts2014 Chairs

unread,
Feb 7, 2014, 8:10:56 AM2/7/14
to micropo...@googlegroups.com
Dear Mena,

thanks to share your concern.

As we have already stated in a previous email as a reply to a similar
question, we have used the NERD ontology plus few additions from YAGO2
to prepare both training and test sets.

> The provided NERD taxonomy is not consistent with DBpedia ontology.
> For example there is no equivalent for EmailAddress in the DBpedia ontology.
> How could I develop my own methods of disambiguation while you are
> directing me to a specific ontology of a specific tool?
> The task is to link entities to DBpedia, so DBpedia ontology is the only
> ontology that should be used. Right?

We have decoupled the typing task from the disambiguation one. Hence,
DBpedia is not the only ontology used to prepare the corpora. Having
said that you are free to use only DBpedia.

Hope it helps.

Best regards,
#Microposts2014 Challenge crew

Ugo Scaiella

unread,
Feb 13, 2014, 10:03:57 AM2/13/14
to micropo...@googlegroups.com
Dear Chairs,

The main concern regarding the taxonomy is that it is not clear whether the results that participants have to submit must be filtered using that taxonomy before submission or you will manage this tasks on yourself before running evaluation scripts.
Actually, (almost) all annotators participants are using do not rely on that taxonomy and most likely they will annotate tweets with DBpedia concepts that are not contained in that taxonomy. This doesn't mean that the annotator is not working fine, but this challenge is just focused on a reduced set of DBpedia concepts.

An this is perfectly fine, but the issue is that you will evaluate both precision and recall, so it is important to remove all DBpedia URIs that are not contained in that taxonomy otherwise, even if the annotator has correctly found a relevant DBpedia URI but that URI is not part of that taxonomy, this case will be considered as a false positive, hence affecting the annotator precision and in turn the overall F1 score.
Could you please clarify what URIs should be included in the TSV files to be submitted?

In case you don't manage such a filter, ie participants have to filter out URIs that are not part of that taxonomy, I think you should clarify how to match a DBpedia URI with that taxonomy, because it's still not clear.

Regards,
-- Ugo Scaiella
Reply all
Reply to author
Forward
0 new messages