Handle system

0 views
Skip to first unread message

Jonathan Rees

unread,
Mar 9, 2009, 2:14:50 PM3/9/09
to Shared names
Does anyone on this list have experience with the handle system? Alan
and I just spoke to Bob Kahn of CNRI and although we had put it aside
a while ago, we're now inspired to take a more serious look at it, as
the infrastructure seems to have good support for replication. (This
is not to say that the *syntax* needs to be adopted or that clients
would use anything other than HTTP to get information.)

Thanks
Jonathan

Benjamin Dai

unread,
Apr 21, 2009, 2:35:42 PM4/21/09
to Shared names
Greetings from Benjamin Dai! I've been recently invited to
participate in the upcoming URI workshop.

To help facilitate discussions, I've appended below some discussion
items (I corresponded with Alan and Jonathan a few weeks ago)
regarding what I've discovered so far about the Handle system and
DOIs. I apologize in advance if the items I've described are too
obvious. Please let me know if you have any pointers that will help
our considerations of the Handle system.

Based on my initial research, DOIs and the associated Handle system is
a compelling solution that has been invaluable in the publications
space. One challenge with the Handle system is that it is coupled to
the DOI model. One cannot get all the features of DOI without
leveraging the Handle system. In my opinion, this coupling is not a
good thing. Furthermore, I've heard that the DOI model is already
suffering from trying to migrate from identifying only publication
object to a much broader scope of any data object. Expanding to
identifying objects outside the publication domain begins to stretch
the original intent of DOIs and it's causing problems...

Another thing I discovered is that social and economic issues are one
the major challenges in various applications. For example, the
federal government CENDI project asserts the biggest challenge of
rolling out their persistent identifier initiative was not technical.
It was the ongoing social aspects of allocating money and resources to
ensure that the persistent identifiers stay fresh (i.e., up-to-date
when physical locations change) and the systems (e.g., Federal
Persistent Identification Resolver) be maintained with external
endorsement (i.e., only trust in a persistent store with institutional
endorsement and support). In their words "It is not sufficient to
create identifiers and leave them without maintenance; active
management is needed in order to gain the benefits of such a system."
I argue this is probably one of the most critical requirements.

Regarding the social and economical aspects, I think our
considerations should go beyond simply building a simpler technology
solution. Our first simple release should have some public
endorsement from an authoritative institution. I know when I gave a
talk on our NCBO PURL Server in last year's NCBC All-Hands meeting,
one of the first major concerns from a key stakeholder was regarding
committed economic and social support for such a system. Why should
anyone commit to any system if it will disappear overnight when
funding runs out or a grant is not renewed? It's a very valid
criticism. What respected institution will be endorsing and using our
services out of the gate? Who is committed to provide resources to
advancing and developing the service? Fortunately, the NCBO should
have some good news to announce on this front at our Workshop.
Furthermore, the Shared Names steering committee members have all
agreed to use shared names in their projects. This will be an
interesting discussion item in our URI Workshop next week.

MacKenzie

unread,
Apr 21, 2009, 3:12:21 PM4/21/09
to Shared names
This reminds me to mention a major project in Germany that is using
DOIs to identify datasets in a range of scientific fields.
http://www.tib.uni-hannover.de/en/about_us/projects/codata/
It was started by Germany's national science library (TIB) but lots of
other institutions wanted in so they've recently started a non-profit
and are expanding membership. It's scope for now is entire datasets
(not the data in them) but they have definitely tackled those nasty
legal and economic issues that Benjamin mentioned.

Below are some slightly random notes from a talk they did at CNI
recently:

“Library role in data curation: citable research data (stored
elsewhere, in a data center).

Brussels Declaration of STM Publishers: access to research data that
supports publications should be available free of charge.

Earth Science domain for initial project. An international group of
data centers already exists to evaluate and maintain the datasets
(climate measurements, etc.). Earth Systems Observation journal will
be peer-reviewed and contain only datasets.

DOI system used for persistent identifiers (easy to link to published
article). TIB is a DOI Registration Agency (not using CrossRef).

TIB registers data worldwide from STM researchers. 600,000 datasets
registered so far. Institutions & data centers store, maintain, and
evaluate the data, not TIB. Keep just the metadata and identifiers,
and the datacenter supplies the bibliographic metadata in ISO 690-2
standard to cite electronic media, of 24 elements including the usual
DC-type fields like author, title, date *nothing specific to research
like protocols or equipment*.

TIB funding is from base funding, charges to datacenters (publishing
agents) for 500 DOIs (surcharge for more), and funding agencies via
research grants. .01 to 1 Euro per dataset to register the dataset.
Data center coses 50-500 Euro per dataset (i.e. 1% of data production
costs). Data productions costs 5000 – 5000000 Euro per dataset.
Funding for data curation is changing so that it can be included in
direct project costs. Still lacking policies for data contribution.

TIBORDER Katalog provides a basic search UI that provides item-level
results of datasets. Results resolve to the portal page at the
datacenter hosting the dataset (from which you can download, view,
etc.).

Authors can cite their data in articles if it’s already been deposited
and cataloged. They use the citation format defined by the datacenter
(e.g. PANGAEA)

Pilot with Elsevier in ScienceDirect so that articles can include a
supplementary data link to the dataset (harvesting DOIs from the
Katalog after publication).

New initiatives:
-- Thieme (chemistry) journal. Author’s workflow submits both the
article and the data to Thieme, and Thieme submits the data to FIZ
Karlsruhe to store and get a DOI from TIB.
-- European technical libraries for data registration will transition
the TIB’s registration agency to a new non-profit organization, funded
by libraries and institutions worldwide, to register scientific
research data. Will also host the shared metadata catalog. MOU doesn’t
require that everyone use DOIs.

Datasets that are removed or data centers that disappear would lead to
the DOIs resolving to tombstone pages.

Metadata included in Katalog is under a Creative Commons-type license
that allows others to download and reuse the metadata only if they
cite it.

Usage statistics of downloads (and by who) is not allowed since it
could be misused at business intelligence.

DOIs are more trusted by publishers over Handles since it comes with a
certain guarantee of persistence and vetting. MIT/DSpace and Max Plank
can get away with using non-DOI Handles since they have a trusted
brand already."

Larry Lannom

unread,
Apr 21, 2009, 9:17:54 PM4/21/09
to Shared names
From the TIB press release of about one month ago

"The goal of this cooperation is to establish a not-for-profit agency
that enables
organisations to register research datasets and assign persistent
identifiers to them,
so that research datasets can be handled as independent, citable,
unique scientific
objects."

The founding members include TIB, the British Library, the Library of
the ETH Zurich, the French Institute for Scientific and Technical
Information (INIST), the Technical Information Center of Denmark and
the Dutch TU Delft Library.

CNRI is currently working with Max-Planck-Gesellschaft on persistent
ids pointing into data sets. If there is interest I can go into that
at the meeting next week.

The final report from PILIN project in Australia may be of interest to
this group. They looked at shared identifier management infrastructure
at the national level. In the end they recommended the handle system,
but the study takes a fairly broad view. The closure report is
referenced on the front page at https://www.pilin.net.au/

Another recent study (UNEP/CBD/WG-ABS/7/INF/2) that looks at
identifiers and scientific data has as one of its goals:

"Identification of different possible ways of tracking and monitoring
genetic resources through the use of persistent global unique
identifiers
(GUIDs), including the practicality, feasibility, costs and benefits
of the
different options."

It favors DOIs. Available as

http://www.cbd.int/doc/meetings/abs/abswg-07/information/abswg-07-inf-02-en.pdf

Larry

On Apr 21, 3:12 pm, MacKenzie <ken...@MIT.EDU> wrote:
> This reminds me to mention a major project in Germany that is using
> DOIs to identify datasets in a range of scientific fields.http://www.tib.uni-hannover.de/en/about_us/projects/codata/
Reply all
Reply to author
Forward
0 new messages