Data Loading

2 views
Skip to first unread message

ja...@openguid.net

unread,
Sep 25, 2008, 4:46:29 PM9/25/08
to Open GUID Discussion
The db was initially seeded with WordNet3.0. I liked it's language
nature and wanted to nail basic concepts in everyday vocabulary.

OpenCyc had more named entities, but too many abstract concepts used
in inferencing that would pollute the db. I am currently working on
importing the entities with WordNet references only (though I must
have slept through prolog class).

The next step would be the UMBEL entities that are linked to OpenCyc.
Straightforward if I can get an easily consumable data file.

Then on to YAGO. The YAGO entities with WordNet links will be easy.
Additional named entities (DBPedia) will be a bit more tricky if we're
to avoid duplicates. I supposed dups shouldn't be a huge issue
because they are an expected problem in the future and will be handled
via making one the primary, and doing 301 redirects from the others,
and will always be available in the published rdf identical
statements. Though I would like to try and keep it clean as much as
possible to avoid unnecessary redirects at the outset.

Ideas?

kidehen

unread,
Sep 25, 2008, 6:19:34 PM9/25/08
to Open GUID Discussion

Jason,

You load UMBEL, you will have a good head start re. DBpedia and
OpenCyc.
Then you can add to your base from the OpenCyc to DBpdia links for
those entities not exposed via UMBEL to DBpedia linkage.

You shouldn't need to go directly to DBpedia as UMBEL, OpenCyc, and
Yago provide nice data access routes.

Kingsley
Reply all
Reply to author
Forward
0 new messages