This reminds me to mention a major project in Germany that is using
DOIs to identify datasets in a range of scientific fields.
http://www.tib.uni-hannover.de/en/about_us/projects/codata/
It was started by Germany's national science library (TIB) but lots of
other institutions wanted in so they've recently started a non-profit
and are expanding membership. It's scope for now is entire datasets
(not the data in them) but they have definitely tackled those nasty
legal and economic issues that Benjamin mentioned.
Below are some slightly random notes from a talk they did at CNI
recently:
“Library role in data curation: citable research data (stored
elsewhere, in a data center).
Brussels Declaration of STM Publishers: access to research data that
supports publications should be available free of charge.
Earth Science domain for initial project. An international group of
data centers already exists to evaluate and maintain the datasets
(climate measurements, etc.). Earth Systems Observation journal will
be peer-reviewed and contain only datasets.
DOI system used for persistent identifiers (easy to link to published
article). TIB is a DOI Registration Agency (not using CrossRef).
TIB registers data worldwide from STM researchers. 600,000 datasets
registered so far. Institutions & data centers store, maintain, and
evaluate the data, not TIB. Keep just the metadata and identifiers,
and the datacenter supplies the bibliographic metadata in ISO 690-2
standard to cite electronic media, of 24 elements including the usual
DC-type fields like author, title, date *nothing specific to research
like protocols or equipment*.
TIB funding is from base funding, charges to datacenters (publishing
agents) for 500 DOIs (surcharge for more), and funding agencies via
research grants. .01 to 1 Euro per dataset to register the dataset.
Data center coses 50-500 Euro per dataset (i.e. 1% of data production
costs). Data productions costs 5000 – 5000000 Euro per dataset.
Funding for data curation is changing so that it can be included in
direct project costs. Still lacking policies for data contribution.
TIBORDER Katalog provides a basic search UI that provides item-level
results of datasets. Results resolve to the portal page at the
datacenter hosting the dataset (from which you can download, view,
etc.).
Authors can cite their data in articles if it’s already been deposited
and cataloged. They use the citation format defined by the datacenter
(e.g. PANGAEA)
Pilot with Elsevier in ScienceDirect so that articles can include a
supplementary data link to the dataset (harvesting DOIs from the
Katalog after publication).
New initiatives:
-- Thieme (chemistry) journal. Author’s workflow submits both the
article and the data to Thieme, and Thieme submits the data to FIZ
Karlsruhe to store and get a DOI from TIB.
-- European technical libraries for data registration will transition
the TIB’s registration agency to a new non-profit organization, funded
by libraries and institutions worldwide, to register scientific
research data. Will also host the shared metadata catalog. MOU doesn’t
require that everyone use DOIs.
Datasets that are removed or data centers that disappear would lead to
the DOIs resolving to tombstone pages.
Metadata included in Katalog is under a Creative Commons-type license
that allows others to download and reuse the metadata only if they
cite it.
Usage statistics of downloads (and by who) is not allowed since it
could be misused at business intelligence.
DOIs are more trusted by publishers over Handles since it comes with a
certain guarantee of persistence and vetting. MIT/DSpace and Max Plank
can get away with using non-DOI Handles since they have a trusted
brand already."