Hi all,
I've been thinking about a similar problem for a while now but I was
thinking about more than just identification though. A GUID can
uniquely identify an instance of a particular object but I also wanted
to be able to attach some more information to this GUID. The main
reason for this is disambiguation. There are other benefits and
complications to doing this. With just a GUID I have no way to
disambiguate this from another GUID unless I have more information.
This is not a fully formed idea so is likely to have some holes.
The method I looked at was to use OID's, GUID's and Gellish English:
At the following URL you can register an OID using a GUID
http://www.oid-info.com/cgi-bin/manage
These two links have more on OID's
http://www.iana.org/assignments/enterprise-numbers
http://www.alvestrand.no/objectid/
Gellish English:
http://en.wikipedia.org/wiki/Gellish_English
The following is just one method of obtaining data.
The following OID is an identifier for me and all numbers beneath it
are delegated to me.
1.3.6.1.4.1.32424
Using DNS we can do something like:
dig
1.3.6.1.4.1.32424.example.com
to get a list of origin servers for the information about that OID. Or
if you want you could pass the information directly using CNAME but I
think that would be pushing DNS too far. I have had this working using
PowerDNS. Personally I prefer getting the root server for that DNS
lookup and then using REST to query the server ie:
http://otherdomain.com/oid/1.3.6.1.4.1.32424/
or something similar and getting the information we have related to
that OID back. For the time being I am ignoring who decides who is the
origin for what etc. Taking this a bit further you can designate OID's
to have particular types ie
let 1.3.6.1.4.1.32424 == PFIX
PFIX.3 == /type/
PFIX.3.4 == /type/people/
PFIX.3.4.6 == /type/people/person/
PFIX.3.4.6.7 == /type/people/person/birth_date
PFIX.3.4.6.7.8 == /type/people/person/gender
The web page obtained at
http://otherdomain.com/oid/1.3.6.1.4.1.32424/
would have the equivalent of
1.3.6.1.4.1.32424.3.4.6 == 1/1/1970
1.3.6.1.4.1.32424.3.4.7 == “male”
or in a more RDF style
<
http://www.example.com/rdf/people/Harry> -> gender -> male
<
http://www.example.com/rdf/people/Harry> -> birth_date ->
1/1/1970
Using these values we can compare types between entities to see if the
two entities are the same or close enough to flag for further
processing. I do not expect this to be infallible but knowledge never
has been.
The reason for Gellish English is so we have a reference point for
what the OID's actually mean. For instance one problem I have always
had with RDF is that predicates that are the same can have different
representations ie:
Harry “Is married to” Jenny
Harry “Has Spouse” Jenny
and this is before we even get into different languages. Using Gellish
English as the reference point we can then assign OID's as predicates
or the Gellish equivalent and use something like.
Harry “Is married to” Jenny
Harry “Has Spouse” Jenny
Harry “PFIX..3.4.7.8.7” Jenny
The would all mean the same thing and the OID method is language
agnostic. Gellish English recognizes synonyms so “Is married to” ==
“Has Spouse”.
There are other ways you could make this system work or add to it and
there are quite a few combinations of the above that could be used to
do this but these are just a few of the thoughts I had on it.
I've got as far as downloading the freebase database (211 million
quads):
http://download.freebase.com/datadumps/
Example quads where you can see types in use:
http://download.freebase.com/datadumps/quad-sample.txt
I've converted it to an integer representation so its faster and was
about to start assigning types to OID's using Gellish English to see
what sort of problems I ran into. Adding PDNS to this would provide a
DNS lookup of the entire freebase database.
Thoughts appreciated!
Regards,
Harry