Should ResourceType be required? What do you think?

343 views
Skip to first unread message

Joan Starr

unread,
Nov 17, 2014, 4:53:41 PM11/17/14
to datacite...@googlegroups.com

We are considering changing Resource Type (metadata property ResourceTypeGeneral) from optional to mandatory. Would this provide any advantages for you? Would this create any disadvantages for you? Please share any thoughts you have on this.

timothy...@ec.europa.eu

unread,
Nov 18, 2014, 3:20:00 AM11/18/14
to datacite...@googlegroups.com
I would be in favour of the resource type being mandatory.  With properties such as Description and Title allowing discovery based on a free text search, the DataCite framework already provides a very effective means to aggregate objects from different data sources.  The potential for aggregating data objects would be considerably improved with the resource type being mandatory.

John Howard

unread,
Nov 18, 2014, 11:12:22 AM11/18/14
to datacite...@googlegroups.com
Locally we already treat resource type as a required field and I'd feel it appropriate to treat it as mandatory within DataCite as well; it provide a useful mechanism for filtering resources in discovery systems. The related question to be considered is what vocabulary might be deployed for this purpose to gain optimal value from this as a mandatory field. - John Howard

Lizz Jennings

unread,
Nov 18, 2014, 12:26:02 PM11/18/14
to datacite...@googlegroups.com
I don't understand the value in using this field - in our repository, it's set the same for everything: Dataset/Dataset.

Why would this be mandatory?

Lizz Jennings
Technical Data Officer
University of Bath


Shu Liu

unread,
Nov 18, 2014, 1:27:57 PM11/18/14
to datacite...@googlegroups.com
I would think this ResourceType is specific to data. Is there a controlled vocabulary for this element? If so, I think making it mandatory will offer advantages in terms of more refined search and discovery.

sanders...@gmail.com

unread,
Nov 18, 2014, 1:58:11 PM11/18/14
to datacite...@googlegroups.com
I agree, resource type helps define the resource and as such is a good tool in refining a search.

Ho Jung Yoo

unread,
Nov 18, 2014, 2:34:05 PM11/18/14
to datacite...@googlegroups.com
Having a mandatory Resource Type is a good idea in principle. But I'm wondering how it would work for complex digital objects or collections that contain a mixture of resource types (e.g., 2 images + 3 text + 1 spreadsheet + 1 software program). If a user then wants to compile a list of objects that contain images, would they have to filter for "image", "collection", "mixed formats", etc. to make sure nothing was missed?

Joan Starr

unread,
Nov 18, 2014, 2:38:00 PM11/18/14
to datacite...@googlegroups.com
Because a number of comments have pertained to the question of a controlled vocabulary, I thought I would provide the list we currently have. Here is the controlled vocabulary that applies to this property. For full documentation regarding the list and examples, please visit http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf

Audiovisual
Collection
Dataset
Event
Image
InteractiveResource
Model
PhysicalObject
Service
Software
Sound
Text
Workflow
Other

Shu Liu

unread,
Nov 18, 2014, 3:02:01 PM11/18/14
to datacite...@googlegroups.com
seeing the list and definitions is helpful :) I think this answers your question, Ho Jung.

Deann Miller

unread,
Nov 18, 2014, 4:57:40 PM11/18/14
to datacite...@googlegroups.com
I think this is a great idea for search purposes, and I don't see how this could hinder anyone by making it mandatory.

Deann Miller
Metadata Coordiantor
NSIDC

Chris Hunter

unread,
Nov 18, 2014, 10:03:43 PM11/18/14
to datacite...@googlegroups.com
I think in principal having resource type as mandatory is a good idea, but as others have said this will be of limited use to those creating DOIs with multiple datatypes being included. Currently all our DOIs are set to "Dataset" as the majority contain multiple types of data. Would the field be able to handle multiple types?
I fear, even if it did hold multiple types the easy option of just using "Dataset" or another generic type would be taken in most cases.
If it came to a vote, I would say Yes make it mandatory. Because until it is mandatory we will not know its true usefulness, and its easier to collect metadata to discard later than it is to try to collect metadata at a later date!

Aaike De Wever

unread,
Nov 19, 2014, 4:40:56 AM11/19/14
to datacite...@googlegroups.com
I second Chris Hunter's view on this. For us as well "Dataset" would be the main/only type we have for now, but on the other hand I don't see any problem with making it mandatory.

Christian Pietsch

unread,
Nov 19, 2014, 5:39:01 AM11/19/14
to datacite...@googlegroups.com
Dear Joan,

thank you for consulting the community on this question. Since it has
become acceptable to assign DataCite DOIs to traditional publications
and other resources, my colleagues and I are strongly in favour of
making ResourceType mandatory.

Moreover, we would ask DataCite to consider harmonising the controlled
vocabulary for this property with other pertinent initiatives, e.g.,
- CASRAI: http://dictionary.casrai.org/research-personnel-profile/contributions/outputs
- COAR: https://www.coar-repositories.org/activities/repository-interoperability/ig-controlled-vocabularies-for-repository-assets/

For example, the ResourceType “Text” is too unspecific in our opinion,
making it hard to characterise grey literature etc.

Cheers,
Christian


On Tue, Nov 18, 2014 at 11:38:00AM -0800, Joan Starr wrote:
> Because a number of comments have pertained to the question of a controlled
> vocabulary, I thought I would provide the list we currently have. Here is
> the controlled vocabulary that applies to this property. For full
> documentation regarding the list and examples, please visit
> http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf
>
>
> Audiovisual
> Collection
> Dataset
> Event
> Image
> InteractiveResource
> Model
> PhysicalObject
> Service
> Software
> Sound
> Text
> Workflow
> Other


--
Christian Pietsch · http://www.ub.uni-bielefeld.de/~cpietsch/
LibTec · Library Technology and Knowledge Management
Bielefeld University Library, UHG L3-126, Tel. +49 521 106 2644
Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
signature.asc

is...@bristol.ac.uk

unread,
Nov 19, 2014, 10:12:46 AM11/19/14
to datacite...@googlegroups.com
Hi Joan,

Are there any specific user cases which have prompted this consideration? If so, would it be possible to have a look at them? Our team is pretty split down the middle on this topic at the moment.

Dom Fripp


Assistant Research Data Librarian
Research Data Service

Library Services
University of Bristol
Tyndall Avenue
Bristol BS8 1TJ

dom....@bristol.ac.uk
data...@bristol.ac.uk

http://data.bris.ac.uk/
Follow us on Twitter @databris


On Monday, November 17, 2014 9:53:41 PM UTC, Joan Starr wrote:

Greg Janée

unread,
Nov 19, 2014, 11:17:56 AM11/19/14
to datacite...@googlegroups.com
There would be substantial cost to making resource type required.  The requirement would ripple from DataCite to allocators, to the clients of those allocators, and in many cases those clients are themselves aggregators of content and the requirement would flow to their customers in turn.  A new requirement like this has the potential of breaking all existing workflows, because every party in the workflow chain (from original producers to aggregators to allocators) would be required to upgrade their systems, with mandated cutoff dates imposed.  That's quite a burden and disruption, and it would be helpful to understand in more detail what benefits would be gained.  The question is not, Is resource type useful?  Certainly so.  But:

- Who is harvesting/searching DataCite, and what (limited) types of resources are they interested in, and why are they interested in only those types?

- What other means are available to them now for filtering results to just those they are interested in?  Is this a case where it's impossible for them to narrow down results, or is it just burdensome to do so?

There's also the legacy issue.  I don't know the statistics for DataCite as a whole, but in EZID at least, fewer than 10% of our DOIs currently have a resource type.  If a requirement were to be put in place, we would be forced to assign "Other" to those that don't, meaning that more than 90% of our DOIs would not have a useful resource type.  That percentage would presumably improve over time, but even by the time we double our number of DOIs, that still leaves half the DOIs without a useful resource type.

Joan Starr

unread,
Nov 19, 2014, 11:49:42 AM11/19/14
to datacite...@googlegroups.com
Hi Dom,

The specific driver for the current inquiry was a request from one of DataCite's partners, ORCID. We've also heard from Thomson Reuters. Both of these organizations are interested in taking DataCite's metadata and matching it to their own.

--Joan

Kyle Banerjee

unread,
Nov 19, 2014, 12:11:38 PM11/19/14
to datacite...@googlegroups.com
I do not favor mandatory fields except when there are compelling reasons. 

I am unclear how shoehorning all resources into a small list of types really helps except for statistical purposes and creating clean summary displays based on rough collocation of aggregated data. As far as I can tell, information in the DataCite record serves an identification rather than a discovery need, so recording information separately that can be immediately observed or automatically detected strikes me as unnecessary. 

Having said that, I think that requiring Resource Type is less problematic than requiring values for creator, publisher, and publication year. The currently required fields presume a "bookish" information model that is sometimes inappropriate and sometimes leads to recording of misleading data because these values are often ambiguous with certain types of resources.  Title is not problematic because a descriptive one can be constructed for purposes of identification.

kyle

Sharon Farnel

unread,
Nov 19, 2014, 1:37:33 PM11/19/14
to datacite...@googlegroups.com
I think the pros and cons of this change have been very clearly stated and I would agree with all of them. This is a decision certainly not to be undertaken lightly.

I do think, though, that a) as DOIs are assigned to more kinds of objects, b) as Datacite (and Datacite related or inspired) metadata plays more of a discovery role, and c) metadata of all kinds is increasingly found and made use of in contexts other than the one it was initially created for, the ability to quickly and easily identify, as well as sort, gather, etc. by, resource type, becomes increasingly valuable.

This may not mean a move to making it mandatory, but to start at least strongly recommended.

And making the various lists of resource types more compatible with each other is also something worthwhile to think on.

Cheers,
Sharon

Ho Jung Yoo

unread,
Nov 19, 2014, 2:13:35 PM11/19/14
to datacite...@googlegroups.com
I'm generally in favor of having richer metadata available and collecting metadata early. My concern is whether accurate classification of Resource Types can be done in a truly useful manner that outweighs the cost of mandating its use across the board. For example, as others have implied, "dataset" is a fairly vague concept that has a surprisingly diverse set of meanings to different people. And it doesn't necessarily exclude other Resource Types, such as images and audiovisual resources. In our repository, we have collections that frequently contain complex digital objects made up of a variety of resource types. It wouldn't be clear whether to classify Resource Type for the object as "collection", since it's actually an object within a collection, or "dataset", as a fallback choice for mixed types?

Resource Type doesn't seem particularly useful in the reference list of a publication, since an object will usually be surrounded by context there. But stripped of context in a registry or repository, it becomes potentially much more useful. Would it be possible for users to indicate multiple Resource Types, which may not need to be displayed in a data citation (this could get messy), but rather used simply for indexing within the registry or repository?

Doug Cooper

unread,
Nov 20, 2014, 12:44:50 AM11/20/14
to datacite...@googlegroups.com
No problem with mandatory, but a + for discussion of extending the
controlled vocabulary in any case.

I've been working with some dataset types I think are distinct additions
(mainly having to do with linguistic data, hence these examples). Am
curious whether anybody else has had to assign DOI names to this sort
of "lightweight data object," for lack of a better term.

1. the dataset is defined by metadata content.
Suggested resource type: KEY
Target: generic landing page for datasets of this type.
Example: a list of 3-letter ISO 639-3 language codes, used to index /
allow aggregation of arbitrary sets of data resources.
Example: a URL with a specific set of REST arguments, used to generate
and return a particular data set.
Discussion: this provides a simple way to document and allow re-use of
sampling frames; e.g. to reproduce or resample existing result sets.
The list of keys is reasonably included as
<description descriptionType='TableOfContents'>list of keys ...


2. the dataset is documented by the metadata, but has not necessarily
been instantiated in e-form (so we can't use it as a target).
Suggested resource type: ABSTRACT or LOGICAL
Target: the text that contains the datasete.
Example: a publication contains one or more datasets; we want to
document these as distinct entities.
Example: a publication contains a multi-column table, but we only want
to document a single column.
Discussion: many datasets found (and independently documented) in the
literature are actually subsets of existing work. We want to document
the abstract dataset that underlies these concrete subset instances, so
that they can all include references to their common parent dataset --
not the print work that happens to contain the parent dataset (and
perhaps others as well).

Michael Charno

unread,
Nov 20, 2014, 11:59:28 AM11/20/14
to datacite...@googlegroups.com
I also don't see the point in making it mandatory.  The beauty of the DataCite schema is its simplicity and minimal requirements, and like others i'm not convinced that Resource Type is important enough to add it to the mandatory group.  We try to always put Resource Type in our metadata because we can and it usually makes sense, but that doesn't necessarily mean that everyone else should as well.  

I could also see problems within our own implementation when we start minting DOIs at a more granular level, where some of our object types don't conveniently fit into any of those types so generic and unuseful values would be included instead.  Making that field mandatory would also require us to update our application source code, as we have a bespoke implementation of the DataCite API and currently don't expect that field to be mandatory.

I see minimal benefit to us or users of the metadata, so it doesn't really make sense in my opinion.  Keep it simple.

timothy...@ec.europa.eu

unread,
Nov 21, 2014, 7:19:58 AM11/21/14
to datacite...@googlegroups.com
Just to reiterate, for the automated aggregation of data sets from distributed sources, there is a great deal of added-value in ResourceTypeGeneral being made mandatory.  In the domain where I work, namely the engineering sciences, the possibility to aggregate distributed data sources has proved illusive and I expect this is the case in many scientific domains.  Besides the very real advantages that DataCite offers in promoting the sharing and reuse of data, it seems also to have the potential to enable data aggregation and I would be very much in favour of any change to the metadata schema that helps in this respect.

Matthew McKinley

unread,
Nov 21, 2014, 12:58:03 PM11/21/14
to datacite...@googlegroups.com
Great discussion of the pros and cons of such an action--There's a lot of food for thought here!

While I certainly understand how the ambiguity of what 'dataset' means as a Resource Type could render it ineffective, I think we could also view this flexibility as a strength. If Resource Type became mandatory, a system could assign 'dataset' as the default value for the field and the system's developers could then choose whether other Resource Type values could be assigned, and if so, which ones. For the case of developers/systems that do not find value in deciding on and assigning a Resource Type, there will be no additional work beyond implementing 'ResourceType=Dataset' within generated DataCite records (which, yes, may be substantial work, but at least not recurring). For those who DO find value in assigning Resource Type, have a drop down with 'dataset' selected but other Resource Type values allowed, so the researcher/curator/whoever can decide the level of description.

Yes, this does mean that many datasets will be assigned the possibly unhelpful Resource Type of 'dataset', but would that be any worse than having no Resource Type? Typing a resource that is defined entirely by its role as research data means a type of 'dataset' will never be incorrect.

I also agree with Ho Jung that multiple Resource Types within a data object are problematic. Assigning multiple 'Resource Type' values is one approach, but this may break systems or APIs attempting to sort data resources by type, and at the least would cause confusion with users expecting to find, for example, only 'image' data and finding a resource with other data types. Recommending curators (whether in the DataCite documentation, within a data system, or both) to assign the default 'dataset' to a data object with multiple types returns to the problem of 'dataset' being too generic, but would at least help eliminate ambiguity about this use case.

Matthew McKinley
Digital Project Specialist
UC Irvine


On Monday, November 17, 2014 1:53:41 PM UTC-8, Joan Starr wrote:

Landcare DataStore

unread,
Nov 23, 2014, 5:50:44 PM11/23/14
to datacite...@googlegroups.com
Re: "Yes, this does mean that many datasets will be assigned the possibly unhelpful Resource Type of 'dataset', but would that be any worse than having no Resource Type? "

I'm not commenting for or against the mandatory or not question, but in response to Matthew's suggestion, my intial reaction is yes, it would be worse - because it would negate the value of type 'Dataset' IF it was mandatory to enter a Type, and people wanted to filter on datasets.  If people who did not wish to 'Type' their objects simply default to 'Dataset', then anyone who filters by dataset will get returns of all sorts of stuff.  In contrast if they left it balnk, then any object having a Type of Dataset would (should) really be a Dataset (noting of course that the definition of Dataset may be broad).

If the field became mandatory AND you wanted to allow a 'default' you'd be better I think to have a Type option of 'Not specified'. Otherwise require everyone to put the 'real' Type.  People still may not accurately enter the Type, but at least you'd not be encouraging them to enter mis information by defaulting to Dataset.

Cheers
Aaron McGlinchy
Landcare Research new Zealand

linds...@zbmed.de

unread,
Nov 25, 2014, 4:26:02 AM11/25/14
to datacite...@googlegroups.com
ZB MED would welcome resource type being mandatory. We would like to improve the discovery of DataCite datasets of different data sources, e.g. in our own discovery service.

Birte Lindstädt
Research Data Management
ZB MED Leibniz Information Centre for Life Science

Damian Steer

unread,
Nov 25, 2014, 9:44:21 AM11/25/14
to datacite...@googlegroups.com
On 19/11/14 16:49, Joan Starr wrote:
> Hi Dom,
>
> The specific driver for the current inquiry was a request from one of
> DataCite's partners, ORCID. We've also heard from Thomson Reuters. Both
> of these organizations are interested in taking DataCite's metadata and
> matching it to their own.
>
> --Joan
>
> On Wednesday, November 19, 2014 7:12:46 AM UTC-8, is...@bristol.ac.uk wrote:
>
> Hi Joan,
>
> Are there any specific user cases which have prompted this
> consideration? If so, would it be possible to have a look at them?
> Our team is pretty split down the middle on this topic at the moment.

(Speaking as a member of the same team as Dom in Bristol)

A request for a particular field isn't a use case.

We provide an abstract, subjects, and the formats of the files that make
up the deposit (using apache tika and soon pronom).

To take a specific example, we have a ~3000 file deposit containing a
mixture of spreadsheets and various instrument outputs concerning the
composition of certain chemicals. It's hard to see what the
'ResourceType' would be, and how it would help improve matching. The
existing abstract, subjects and file formats strike me a being much more
useful for that task, and placed only a limited burden on the depositor.

But I'm not sure what values are being suggested for ResourceType.

Damian Steer

--
Damian Steer
Senior Technical Researcher
Research IT
+44 (0) 117 928 7057

Tove Nielsen

unread,
Nov 26, 2014, 7:06:00 AM11/26/14
to datacite...@googlegroups.com

On Monday, November 17, 2014 10:53:41 PM UTC+1, Joan Starr wrote:

We are considering changing Resource Type (metadata property ResourceTypeGeneral) from optional to mandatory. Would this provide any advantages for you? Would this create any disadvantages for you? Please share any thoughts you have on this.

 
We think this is a great idea for search purposes, and we don't see how this could hinder anyone by making it mandatory.

Regarding  multiple digital objects or collections that contain a mixture of resource types (e.g., text +  image and video) it could be useful if the ResourceType is repeatable.


Tove and Birthe

DTU Library

 

Technical University of Denmark

http://www.dtu.dk/images/DTU_email_logo_01.gif

Technical Information Center of Denmark

Anker Engelunds Vej 1 PO Box 777

Building 101D

DK - 2800 Kgs. Lyngby

Denmark

Direct +45 45257219

tg...@dtu.dk

www.dtic.dtu.dk/

 

Damian Steer

unread,
Nov 26, 2014, 7:30:58 AM11/26/14
to datacite...@googlegroups.com
[Resending since something went odd]

On 19/11/14 16:49, Joan Starr wrote:
> Hi Dom,
>
> The specific driver for the current inquiry was a request from one of
> DataCite's partners, ORCID. We've also heard from Thomson Reuters. Both
> of these organizations are interested in taking DataCite's metadata and
> matching it to their own.
>
> --Joan
>
> On Wednesday, November 19, 2014 7:12:46 AM UTC-8, is...@bristol.ac.uk wrote:
>
> Hi Joan,
>
> Are there any specific user cases which have prompted this
> consideration? If so, would it be possible to have a look at them?
> Our team is pretty split down the middle on this topic at the moment.

Christian Pietsch

unread,
Nov 26, 2014, 7:49:33 AM11/26/14
to datacite...@googlegroups.com
Hi Damian,

On Tue, Nov 25, 2014 at 02:44:18PM +0000, Damian Steer wrote:
> To take a specific example, we have a ~3000 file deposit containing a
> mixture of spreadsheets and various instrument outputs concerning the
> composition of certain chemicals. It's hard to see what the
> 'ResourceType' would be, and how it would help improve matching. The
> existing abstract, subjects and file formats strike me a being much more
> useful for that task, and placed only a limited burden on the depositor.
>
> But I'm not sure what values are being suggested for ResourceType.

The list of allowed values was included in Joan's message of 18 Nov
2014, and of course you will find it in the DataCite Metadata Kernel
PDF and in the XML Schema.

For your example, you would use either of the following lines:
<resourceType resourceTypeGeneral="Collection">whatever</resourceType>
or
<resourceType resourceTypeGeneral="Collection"></resourceType>

Perhaps it is worth pointing out that while the attribute
resourceTypeGeneral needs to have a value from the list mentioned
above, the element resourceType may be empty. So for instance, if all
you know about some publication is that it is a data publication, the
following resourceType would be helpful enough to differentiate it
from a traditional publication:

<resourceType resourceTypeGeneral="Dataset"></resourceType>

That is why my colleagues and I think it is useful to make the
resourceType element mandatory, and this requirement is not a great
burden.

Cheers
Christian

--
Christian Pietsch · http://purl.org/net/pietsch
LibTec · Library Technology and Knowledge Management
Bielefeld University Library, Bielefeld, Germany
signature.asc

Ian Bruno

unread,
Dec 1, 2014, 7:54:05 AM12/1/14
to datacite...@googlegroups.com
Personally I can see value in being able to distinguish a "Dataset" from other material so can understand reasons for wanting to make resourceTypeGeneral mandatory.

For us this would involve a few days development work plus the elapsed time to update existing records and monitor that. Obviously this would need to be prioritised against other work. Our data sets are very homogeneous so we wouldn't face some of the issues raised by others dealing with more complex digital objects. 

Updating existing records could have implications for third parties taking feeds from the DataCite metadata store (something we are establishing with Thomson Reuters for example), requiring them to reprocess all our records (over 500,000) because of a change to modification date.

Ian Bruno, CCDC

Alain Broc

unread,
Dec 9, 2014, 8:12:43 AM12/9/14
to datacite...@googlegroups.com
I think it would be interesting to have the Resource Type mandatory, for search purpose. As in some comments, what about the documents like articles in which we find text, images and datas. Would it be a "text" type? Thank you.
Reply all
Reply to author
Forward
0 new messages