Fwd: URI format wiki page

2 views
Skip to first unread message

M. Scott Marshall

unread,
Apr 26, 2010, 4:24:57 PM4/26/10
to shared-names, Jonathan Rees
[Anybody know how to resend in gmail? Please see forwarded msg below.  -Scott]

On Mon, Apr 26, 2010 at 6:34 AM, Jonathan Rees <j...@creativecommons.org> wrote:
Re http://sharedname.org/page/URI_Format_Considerations
Good start. I've made some changes and additions, including links out
to some of the Issue/ pages, and changing "I" to "Scott".
There is some redundancy that probably ought to be cleaned up. I think
what's new is (1) separation of form-of-URI issues from other issues
and (2) advocacy for a particular proposal.

Thanks for the many improvements - I hadn't sent anything to the list yet because I was going to add a few things, such as merge in your own issues page and from your latest post to SN. (thought the "I"'s were out - guess I turned a blind "I") 


The trademark infringement risk issue has me now rather inclined to
the numberic idspace approach. We'd all have to learn a new set of
phone numbers, but cleary we're all able - who in this business
doesn't know what 9606 means?

I've been thinking along similar lines although still hanging on to mnemonic names because I imagine that familiar names will be perceived important "packaging" for human consumption. Actually, I think that handling both (at least for a few record types) would help us promote the use of SN URI's because initial use will be by people who would appreciate working with familiar tokens (not having to constantly do table lookup). 

Solution 1:
Support both DOI style and familiar record name (would this 'strain' the infrastructure?): 
Solution 2: Tools for automatic translation: i.e. show labels, use id's
A possible tool could be based on something similar to the approach used in AIDA for query building (attaching screenshot), where RDF labels are seen by the 'end user' in the repository browser, but looking under the hood at an example SPARQL query shows that an id has been inserted in place of the label (ahem, id should be a URI here not a "ynode", issue reported, was supposedly fixed, but you get the idea). Okkam also has tools for number-based id's.

BTW,  I spoke at length with Paolo Bouquet and Stefano Bocconi (Okkam) at the Linked Data and AI Workshop at AAAI. Okkam is happy to help and register SN's at Okkam so once we've decided on the URI's. If numberspace is interesting, we might be able to get some code or feature support for SN developers.

-Scott

--
M. Scott Marshall
Leiden University Medical Center / University of Amsterdam
http://staff.science.uva.nl/~marshall


--
You received this message because you are subscribed to the Google Groups "Shared names" group.
To post to this group, send email to shared...@googlegroups.com.
To unsubscribe from this group, send email to shared-names...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/shared-names?hl=en.
GABA receptor activity template query.PNG

Jonathan Rees

unread,
Apr 26, 2010, 5:05:40 PM4/26/10
to M. Scott Marshall, shared-names
comments inline below
The main reason we got started with SN was because we had too many
URIs for the same thing ("creeps" as MW says). Anything (other than SN
itself, of course! ha.) that creates aliases is to be avoided. What
you want is a discovery service. This could be RESTful, but then the
URIs look quite different:

http://sharedname.org/find_idspace?name=NCBI+Gene
http://sharedname.org/find_accession?idspace=NCBI+Gene&number=7157

The latter might lead to the same place as the 'official' URI, or it
could lead to a page telling you what the 'official' URI is (from
which you could click through to more interesting places).

By the way that would be not http://sharedname.org/123456789 but
something more DOI-like:

http://sharedname.org/12/7157

assuming 12 = NCBI Gene. We need to keep the idspace/record
distinction even if both are numbers.

> Solution 2: Tools for automatic translation: i.e. show labels, use id's
> A possible tool could be based on something similar to the approach used in
> AIDA for query building (attaching screenshot), where RDF labels are seen by
> the 'end user' in the repository browser, but looking under the hood at an
> example SPARQL query shows that an id has been inserted in place of the
> label (ahem, id should be a URI here not a "ynode", issue reported, was
> supposedly fixed, but you get the idea).

Yes. The RDF served by a SN server should provide rdfs:label
properties that describe the accession (record) in human-readable
form. The labels can be revised from time to time as needed as mergers
and acquisitions and rebrandings happen, be given in multiple
languages, etc. There can also be other properties (idSpaceName ?)
that provide information of a more predictable nature more suitable
for use in queries of the sort you give - in case the rdfs:label
contains extra noise that you wouldn't want to have to deal with in a
query e.g. rdfs:label "NCBI Gene idspace" as opposed to sn:idSpaceName
"NCBI Gene".

Jonathan

Peter Ansell

unread,
Apr 26, 2010, 8:14:21 PM4/26/10
to shared...@googlegroups.com, M. Scott Marshall
On 27 April 2010 07:05, Jonathan Rees <j...@creativecommons.org> wrote:
> comments inline below
>
> On Mon, Apr 26, 2010 at 4:24 PM, M. Scott Marshall
> <mars...@science.uva.nl> wrote:

>> Solution 2: Tools for automatic translation: i.e. show labels, use id's
>> A possible tool could be based on something similar to the approach used in
>> AIDA for query building (attaching screenshot), where RDF labels are seen by
>> the 'end user' in the repository browser, but looking under the hood at an
>> example SPARQL query shows that an id has been inserted in place of the
>> label (ahem, id should be a URI here not a "ynode", issue reported, was
>> supposedly fixed, but you get the idea).
>
> Yes. The RDF served by a SN server should provide rdfs:label
> properties that describe the accession (record) in human-readable
> form. The labels can be revised from time to time as needed as mergers
> and acquisitions and rebrandings happen, be given in multiple
> languages, etc. There can also be other properties (idSpaceName ?)
> that provide information of a more predictable nature more suitable
> for use in queries of the sort you give - in case the rdfs:label
> contains extra noise that you wouldn't want to have to deal with in a
> query e.g. rdfs:label "NCBI Gene idspace" as opposed to sn:idSpaceName
> "NCBI Gene".

The shared names project might also want to standardise a sub-property
of dc:identifier that people can also use without regard to the URI in
order to get direct access to an item using sn:idSpaceName "NCBI_Gene"
and sn:idSpaceToken "7157" or even just sn:idSpaceAndIdentifier
"NCBI_Gene:7157" or sn:idSpaceAndIdentifier "12/7157" .

The project so far has been going with the assumption that every
scientific organisation currently wants a central authority like the
DOI foundation to hold all of their metadata and be an intermediary in
every single Linked Data HTTP URI resolution. However, they may
actually just want the set of idSpaceName's standardised, and retain
the ability to let the Linked Data resolution happen in whatever
context an experiment requires (ie, URI freedom). Even though DOI's
have metadata stored by the DOI foundation after being paid for by the
author of the information, their most popular use is for simple
resolution, and simple resolution is cheap compared to the Shared
Names and DOI goals of registering and keeping information about each
entity in addition to providing an HTTP redirection service.

If the project is motivated mostly by scientists who want consistent
access to information then it will have issues with funding, as
scientists constantly move from grant to grant with no ability to
continually financially maintain a long term project. It may just be
that scientists want to be able to identify things in the future using
properties such as the examples above rather than single URI's which
come with immutable metadata and a single resolution point, however
useful a standard URI may be to federated query engines.

If the project is motivated by computer scientists who find it
difficult to find information they may be equipped to use both URIs
and properties. They may also see the computational difficulties with
a single global context for every record, which doesn't enable them to
contextualise the representation of a record to suit their own
purposes without having to negotiate with the community over a long
period of time just to make their short term project work.

If Shared Names was only a redirection service, ala PURL, it would
only have to standardise the idSpaceName's, and distribute a small
redirection file between a large number of low-performance mirrors to
succeed. The insistence on storing metadata, and standardising a
single URI form for each record are the hard bits from both a funding
and organisational point of view. If funding were adequate or assured
by the scientific database providers, it would be simple for Shared
Names to register itself with DOI as an authority or registration
agency and start registering and naming things using
doi:10.9999/idSpace:identifier or a wider range of DOI authority names
if it were a registration agency.

Has anyone looked into the possibility of a future Shared Names
foundation becoming a DOI registration agency [2] and setting up a
completely separate infrastructure that may have a lower cost base
than the CrossRef fees [1] for example?

Cheers,

Peter

[1] http://www.crossref.org/02publishers/20pub_fees.html
[2] http://www.doi.org/registration_agencies.html

Stefano Bocconi

unread,
Apr 27, 2010, 5:56:50 AM4/27/10
to shared...@googlegroups.com
Hi Jonathan,

Just a little comment about the DOI format you suggest in for example

http://sharedname.org/12/7157 assuming 12 = NCBI Gene

I remember that Geoffrey Bilder from CrossRef once told me that if he
could go back he would remove that distinction, since in take-overs new
organizations try to change the prefix to their own numerical one since
they know the prefix has semantics. Geoffrey said that he would use a
completely opaque numeric id. This might not apply to the domain of
Biology though as much as to the publishing one.

Regards,

Stefano

Jonathan Rees

unread,
Apr 27, 2010, 7:45:31 AM4/27/10
to shared...@googlegroups.com
On Tue, Apr 27, 2010 at 5:56 AM, Stefano Bocconi
<stefano...@gmail.com> wrote:
> Hi Jonathan,
>
> Just a little comment about the DOI format you suggest in for example
>
> http://sharedname.org/12/7157 assuming 12 = NCBI Gene
>
> I remember that Geoffrey Bilder from CrossRef once told me that if he could
> go back he would remove that distinction, since in take-overs new
> organizations try to change the prefix to their own numerical one since they
> know the prefix has semantics. Geoffrey said that he would use a completely
> opaque numeric id. This might not apply to the domain of Biology though as
> much as to the publishing one.
>
> Regards,
>
>   Stefano

Interesting - I didn't know this. I guess we should think this
through. How does the server pick the single number apart so that it
can do delegation? Is the single number some invertible polynomial of
the (idspace, accession) pair, or some other kind of bijection such as
bit interleaving? Or is there a big table somewhere?

I think syntactic separation is much more human- and compute-friendly
than opaque identifiers, and if expectations are *very* clear that the
URI is "owned" by the naming system and its users, and not by the
publishers, then my guess is the two-component form can be made to
work. SN sets that expectation. Academic citations always include the
name the publisher had at the time of publication, not its current
name, and SN is attempting to achieve that level of stability.

Geoff's problem is that his salary comes from publishers, not users,
so maintaining a coherent naming system in the face of corrupting
pressures will always be an uphill struggle. E.g. what happens when a
publisher goes belly-up, or decides to pull out of the Crossref
system?

Best
Jonathan
Reply all
Reply to author
Forward
0 new messages