identifiers for DataONE API

11 views
Skip to first unread message

Ryan Scherle

unread,
Feb 18, 2013, 4:56:46 PM2/18/13
to dryad-dev@googlegroups.com Developers
DataONE has long had a requirement that all objects be immutable. This includes both data and metadata. This works well with Dryad's model for versioning bitstreams, but it conflicts with Dryad's model for curating metadata. Metadata in Dryad is frequently edited after the associated objects are published. The current solution to this discrepancy is that Dryad will add timestamps to the identifiers for metadata. This allows the metadata objects to change, while retaining the same base DOI.

Current identifiers in Dryad:
Data Package (metadata) = http://dx.doi.org/10.5061/dryad.12
ORE Resource Map = http://dx.doi.org/10.5061/dryad.12/d1rem
Data File (metadata) = http://dx.doi.org/10.5061/dryad.12/1
Data File (bitstream) = http://dx.doi.org/10.5061/dryad.12/1/bitstream

Proposed identifiers with timestamps:
Data Package (metadata) = http://dx.doi.org/10.5061/dryad.12/mdVer2007-12-06T20:55:28Z
ORE Resource Map = http://dx.doi.org/10.5061/dryad.12/d1rem/mdVer2007-12-06T20:55:28Z
Data File (metadata) = http://dx.doi.org/10.5061/dryad.12/1/mdVer2007-12-06T20:55:28Z
Data File (bitstream) = http://dx.doi.org/10.5061/dryad.12/1/bitstream

A few notes:
1) The suffix for a resource map is "/d1rem". This indicates that the resource map is specific to DataONE. The current DataONE resource maps include URLs that resolve through the DataONE coordinating nodes.
2) We aren't (currently) registering DOIs for the resource map and bitstream. They are simply represented by identifiers that look like DOIs. In the future, we could choose to register these as actual DOIs, but that's a separate discussion.
3) Each timestamp comes from that item's last modification date. Timestamps can differ between data packages and data files, because each can be edited without the other.
4) Bitstreams never have timestamps. If a bitstream changes, Dryad will publish a completely new version of the item, with a new base DOI.
5) When DataONE implements support for "mutable" objects, we will be able add additional references, which will allow the base DOIs to be used again. The base DOIs will reference the most recent versions of objects, consistent with current behavior in Dryad.

A problem with the above scheme is that non-DataONE users of Dryad will not expect identifiers to have timestamps. In particular, it will be nearly impossible to construct an access URL without knowing the associated timestamp. Therefore, I am proposing some additional functionality to the Dryad implementation of the DataONE API:
1) If someone requests a metadata object without a timestamp, they get the object with no timestamps. That is, the item's identifier field will not contain a timestamp, and any hasPart/partOf relationships will be expressed without timestamps.
2) The DataONE coordinating nodes will not notice the timestamp-free objects, because they will not appear in the normal object listing (/mn/object).
3) We will provide a special URL to obtain a listing of the timestamp-free objects, for use by people who want to use the DataONE API without being constrained by the immutability requirement. This URL will be /mn/object/simple.

Thoughts?

-- Ryan

Ryan Scherle

unread,
Feb 18, 2013, 11:37:39 PM2/18/13
to ben leinfelder, cor...@dataone.org, dryad-dev@googlegroups.com Developers
Hi Ben,

I'm copying this back to the dryad-dev group with my responses...

On Feb 18, 2013, at 8:10 PM, ben leinfelder <leinf...@nceas.ucsb.edu> wrote:

> Hi Ryan,
> This is a good summary of what you talked about at the recent CCIT meeting. I have a few comments...
>
>> 1) The suffix for a resource map is "/d1rem". This indicates that the resource map is specific to DataONE. The current DataONE resource maps include URLs that resolve through the DataONE coordinating nodes.
> Will the resource map reference data/metadata identifiers using the versions with or without timestamps? For them to be useful for DataONE, I think they will need to be with timestamps so they match the object identifiers harvested by the CN.

If a resource map was requested with a timestamped ID, it would be returned be returned with time tamped references to the component parts. This would be the normal behavior for DataONE purposes.

>> 2) We aren't (currently) registering DOIs for the resource map and bitstream. They are simply represented by identifiers that look like DOIs. In the future, we could choose to register these as actual DOIs, but that's a separate discussion.
> I'm sure the DataCite folks would like you to register these DOI-looking identifiers. In fact I think every DOI-ish looking identifier should be registered if we really want to play by the rules. Otherwise we have a bunch of DOI-looking identifiers in DataONE, namely, everything with a DOI+timestamp and none of them actually resolve using the DOI service, right?

Hilmar and I discussed this, and we agreed that anything which looks like a DOI should be registered as a DOI. Since I don't want to make claims about the persistence of the timestamped objects, they won't be registered as DOIs. I think the lesser evil is to add the timestamp as a parameter.

>> 3) Each timestamp comes from that item's last modification date. Timestamps can differ between data packages and data files, because each can be edited without the other.
> It seems like if the contents (data files) of a data package change, then the identifier of the package should also change. This should also result in an updated ORE map that does that packaging. Otherwise the newer data object is orphaned in the eyes of DataONE.

This proposal only affects the metadata objects. If the file (bitstream) contents change, a completely new object will be created. The new object would have new DOIs and a new resource map.

--- Ryan
Reply all
Reply to author
Forward
0 new messages