Metadata change suggestion: Add 'creatorType' to the Creator element

115 views
Skip to first unread message

Kristian Garza

unread,
Jan 26, 2022, 5:24:26 AM1/26/22
to DataCite Metadata, meta...@datacite.org
Hi Metadata Working Group,

The following is a suggestion by one of our members that was discussed in the last Open Hours session. There was a significant number of comments and discussion about this suggestion. Thus here we are forwarding all the information for your consideration.

Best regards





Comments from Open Hours support

```

Yes, it should be added. I forgot to mention one thing. When you normalize names with authority data, you can no longer distinguish the author from the editor. The information added to the name (ed., hrsg., ...) is lost. Linked data users benefit enormously from the change.
Hans Schürmann from the ZHB Luzern in Switzerland has suggested to apply a "creatorType" to the Creator in the schema, just as currently happens for Contributors. His primary use case for this is a case where an editor is the sole intellectual creator of a work (e.g. a new work was created through the compilation and assembly of other works), but they have not authored the material of which the work is composed. Our current definition of Creator seems to imply it is only valid for the latter option. His proposal is attached.
Hans Schürmann from the ZHB Luzern in Switzerland has suggested to apply a "creatorType" to the Creator in the schema, just as currently happens for Contributors. His primary use case for this is a case where an editor is the sole intellectual creator of a work (e.g. a new work was created through the compilation and assembly of other works), but they have not authored the material of which the work is composed. Our current definition of Creator seems to imply it is only valid for the latter option. His proposal is attached. 

To distinguish between authors and editors in the list of creators. To enable the exchange of metadata between our repository (Zenodo Community) and any reference management system (e.g. Mendeley).
Yes, this is important. We had the discussion in our institution as well. An editor (case with only an editor) wanted to be named as such and not as creator, what he felt was not his role.
Many IRs use the Dublin Core schema, and Creator must be qualified with a role for the value to make sense.
Need to be able to indicate "Editor" explicitly as a type of "creator", both for cases where the creator list includes both authors and editors, and especially for cases where there is only an editor that should be cited.
What is the problem that your suggestion would solve?

I have many resources to register in which it is inappropriate to credit the creator as an "author". These are compilations of data from many different sources, where the creator of each part is credited at a lower level (like chapters in a book, for example, but the data set equivalent).
Why is change important to you?

Proper citation and credit is a core value for attaching DOIs to our data sets in the first place. What type of workarounds have you created to help with the lack of this feature?
We "credit" the editor twice. Since we are compelled to list at least one creator in the DataCite metadata, we list the editor(s) as creators, and then also list them as Contributors with a role of "editor".
Who is impacted by the lack of your suggestion in our schema?

The true authors, who see someone else being credited for the work they did, and the editors, who appear to an outside party to be taking credit for work they did not do. When was the first time you or your organisation felt the need your suggestion be implemented in the schema?The very first data set I ever tried to assign a DOI to had only an editor. That would have been about three years ago. 
How often do you or your organisation need to use ?

I haven't run the numbers, but I would estimate that 10-15% of our legacy data should be citing an editor only. In more modern data sets, probably only about 5% should cite an editor only (this number would be lower but for our parent organization's requirement to assign DOIs at particular levels of aggregation regardless of content).


It's probably difficult to imagine my context in the abstract. If you'd like to arrange a brief telecon, I'd be happy to do that. Or perhaps it would help to explore our archive a bit (https://pdssbn.astro.umd.edu/data_sb/by_mission.shtml, for example) to see some of the complexity we're dealing with. We do not assign DOIs to individual observational products, in general, but to "data sets", "collections", and "bundles" (odd terminology, I know, but we're stuck with it). A DOI in our context corresponds to a refereed journal citation. We do not create them lightly, and getting the authorship/editorship wrong for a refereed publication is potentially career-impacting. Promotion and tenure committees take refereed publications very, very seriously.
Hope that helps. Thanks!


```



Kristian Garza | Product Designer | DataCite
Support Desk | Support Site | PID Forum
A: DataCite e.V. -- Welfengarten 1B, 30167 Hannover, Germany

Kristian Garza

unread,
Jan 26, 2022, 5:28:02 AM1/26/22
to DataCite Metadata
additional link for the person who suggested the change:  https://docs.google.com/presentation/d/1p_6vkiuet6AR5v0K69hm--AuJQodBc_p/edit#slide=id.p5

Ted Habermann

unread,
Jan 26, 2022, 11:59:37 AM1/26/22
to Kristian Garza, DataCite Metadata
Kristian et al.,

Thanks very much for this information regarding this change. I agree completely that identifying editors correctly is important. However, adding a type to the creator object will inevitably lead to confusion. Is that type only author or editor, or are all values from the contributorType list valid?

More importantly, I believe that we currently have at least two ways to do this: 
  1. Use :none or :unap for creator name and provide all information about editor as a contributor
  2. Deprecate the creator object and add ‘creator’ or (better) ‘author’ to the contributorType list.
I prefer the second approach as it simplifies the schema rather than complicates it. I would also recommend adding Publisher to the contributorType list which solves the problem we have with publisher identifiers and further simplifies the schema by treating all contributors in the same way.

Of course this requires some migration of current content and an evolution of some code, but that evolution involves migrating content into an already existent structure, easier than adding new structures for publisher...

Ted




-- 
You received this message because you are subscribed to the Google Groups "DataCite Metadata" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datacite-metad...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datacite-metadata/fbff68a0-6ae6-4638-b2d0-f28ed303ebb8n%40googlegroups.com.

Lars Holm Nielsen

unread,
Jan 27, 2022, 3:02:44 AM1/27/22
to DataCite Metadata
Hi,

Using a combined creator and contributor field will in my opinion not work. A key requirement is to be able to generate a citation string from the DataCite metadata. If you combine creator and contributor into a single "creatributor" field you are no longer able to differentiate who should get credit (aka listed in the citation string) vs who should not (aka just listed in the metadata).

In an ideal future, contributors should also get credit, but we're still far from this IMHO. The requirements we're being met with when developing repository systems (InvenioRDM in my case), is that we must be able to both clearly distinguish who gets credit and record additional contributors.

Best regards,
Lars

Ted Habermann

unread,
Jan 27, 2022, 10:53:41 AM1/27/22
to Lars Holm Nielsen, DataCite Metadata
Lars,

The contributor element already has the type that you are suggesting we add to creator. Authors are identified as contributors with a type = ‘creator’ in the same way that creators that are authors would be differentiated from creators that are editors using creatorType.

The proposed change simplifies and harmonizes treatment of all contributors and allows all kinds of credit to be identified.

Ted
  

Kristian Garza

unread,
Jan 27, 2022, 12:07:44 PM1/27/22
to DataCite Metadata
Hi,

I am also not sure that deprecating Creator is a viable solution. In almost all schemas Creators holds separate place to other contributors. Moreover, it will create breaking changes with many other applications (not only Datacites) as many other schemas are already mapped to Datacite:creator (see RDA schema mappings). Plus deprecating creator is a non-backward compatible change. 

To be honest, I am not sure if we even need a schema change. Maybe it's just a matter of providing guidance.


Kristian.

Lars Holm Nielsen

unread,
Jan 27, 2022, 4:14:31 PM1/27/22
to DataCite Metadata
The assumption here is that creatorType and contributorType can share CV items. E.g. an editor can be present in both a creatorType and contributorType. In some cases the editor needs to be credited, in others not. If you make a a combined creatributor list, you won't know if the editor should be included or not in the citation string. 

As I said, we should give all contributors credit, but reality is not like this today.

/Lars

Ted Habermann

unread,
Jan 27, 2022, 4:45:05 PM1/27/22
to Lars Holm Nielsen, DataCite Metadata
Lars,

I spend quite a bit of time working on ISO standards for geographic metadata and one of the most common complaints I hear from users, even today (years later), is that the standard is too complex and specifically that there are multiple ways to do the same thing. We also hear this about the DataCite metadata schema, for example with relatedIdentifier and relatedItem in V4.4.

This experience suggests that having creator and contributor use the same type list will inevitably lead to confusion and headaches. It seems that you are suggesting the if an editor is included as a creator, then she gets credit which = listed in the citation. If she is included as a contributor - no credit and no citation mention. Is the same true for Producer? Or other contributorTypes? Are you proposing that creator and contributor use the same list of types but creator can only use a subset of the list, i.e. author or editor? Actually, neither author or creator are actually on the contributorType list so, your suggestion also required adding creator to the list.

As a member of the Metadata Working Group I am aware of the larger picture. There are two other changes being considered which are relevant here. First, DataCite would like to align their contributor types with the CRediT list which is being accepted as a NISO standard. There are a lot of challenges in that task, but it reflects a desire to give more kinds of credit. You say that this is not yet reality. I believe it is coming sooner than you think, so making a change that is consistent with that direction is a good thing that helps us prepare for the future.

Also, there is another current issue about adding identifiers for Publisher. This is a very significant breaking change as publisher is now a text element and, in order to add an identifier, it will become an object, like creator and contributor. 

We can solve both of these problems using an existing structure by simply adding creator and publisher to the contributorType list. I understand that this is a change, but it is a change that eliminates complexity and simplifies the treatment of all kinds of contributors now and in the future. 

I am a big believer in simplification when possible and this is a great opportunity to kill multiple birds with one schema change.

Stay safe,
Ted

Lars Holm Nielsen

unread,
Jan 27, 2022, 7:06:00 PM1/27/22
to DataCite Metadata
Ted, 

To clarify, the only thing I'm pointing out is that a single harmonised structured for contributors won't work. I understand the reasons for the suggestion of adding creatorType, but I have not strong opinions on if it should be added or not added.

DataCite was created to enable data citation. One of the key reasons for splitting e.g. creator name into family/given name back in 4.0 was because generating a citation string from the DataCite metadata is a key requirement, and you simply couldn't do it properly unless you had a name split into family name/given name (despite the troubles it can create for names around the world). It's also the same reason publisher is a required field and not to just a recommended field, because it's a key part of citation string. As an example the DataCite's and CrossRefs DOI citation formatters works because the underlying metadata has a structure that allows it to generate the citation strings.

I've had countless discussions on this topic from both the computer science, librarian, publisher and researcher perspective. From the computer science perspective I get same view point as you are proposing. Harmonise and simplify into a single structure. From all others,  publishers/librarian/researcher perspective, I get the the requirement - any mistake in an automatically generated citation string, however small it is, is immediately pointed. Publishers complain if the publisher information in a citation string is not correct. Researchers complain if the list of authors is in the citation string is not correct. Bottom line is that end users of the metadata care a lot about correct credit and citation according to today's norms. And even if the world changes tomorrow, you would still have a huge backlog of items that still would require correct representation (besides the fact that we started with trying to convince people to do data citation 20 years ago, and it's still not happening). Thus, if you're not able to generate a correct citation string from the DataCite metadata, I think it's a very big change with a huge impact on the people we're trying to support. 

Now, perhaps an example clarifies the issue better. An assumption here, is that you would add a new contributorType named "Creator" if you harmonise everything into contributors. Also, it doesn't matter if the issue exists for other contributorTypes as you just need a single type that can be present in both as creators and contributors (like editors).

Given the following metadata:

<contributors>
<contributor contributorType="Editor">
<contributorName>Ted</contributorName>
</contributor>
<contributor contributorType="Creator">
<contributorName>Lars</contributorName>
</contributor>
</contributors>

You're not able to determine which of the following is the correct citation string is:

"Ted, Lars (2022)...."
"Ted (2022)...."
"Lars (2022)...."

Given a structure like:

<creators>
<creator creatorType="Editor">
<creatorName>Ted</creatorName>
</creators>
<contributors>
<contributor contributorType="Creator">
<contributorName>Lars</contributorName>
</contributor>
</contributors>

You can determine that the correct citation string is:

"Ted (2022)"

My last point is that when a data structure becomes too generic and the purpose and responsibility of a given property is not well-defined, it loses it value because then the structure can and will be used for everything, and then we're not much further than unstructured text. Obviously there's a fine balance between generalisation and specificity.

Best regards,
Lars

uit.p...@gmail.com

unread,
Jan 28, 2022, 4:15:26 AM1/28/22
to DataCite Metadata

Please allow me to contribute (contributorType: Repository Manager, Data Curator, Researcher) with some comments to this discussion from the perspective of research data publishing:

1. In my understanding, credit and citation are not the same. You might want to credit people and/or organizations for having contributed to a published research output without them being included in the citation that should be used when referring to the research output.

2. Which type of contributors that should be cited and how they should be cited depends on a) domain-specific standards and b) the citation style used.

3. The rules for data citation are still under development. We still lack fully-fledged resources and guidelines such as citation styles for different resource types. Allow me to point to some initial guidelines for citation of linguistic/language data which I have contributed to (contributorType: Author): https://doi.org/10.7551/mitpress/12200.003.0015. Within some fields of linguistics, e.g., also other contributor types than creator/author and editor might need to be included in the citation, e.g., language user or language consultant.

4. Based on my comments above, the main reason why metadata schemas like DataCite in some cases currently do not provide enough information to generate a fully-fledged data citation is because we a) lack proper standards for how different data products should be cited (and also how contributors should be credited in addition and/or instead of citation); b) as a consequence, the DataCite Metadata Schema also lacks more granular resourceTypes (or possibly also resourceSubTypes) to be able to provide correct data citation. On this topic, see this discussion thread in the PID Forum: https://pidforum.org/t/granularity-of-datasets/1084.

5. Coming back to Lars' example:


<contributors>
<contributor contributorType="Editor">
<contributorName>Ted</contributorName>
</contributor>
<contributor contributorType="Creator">
<contributorName>Lars</contributorName>
</contributor>
</contributors>

given the metadata record also provides information about the resource type (e.g. data file, dataset, data collection), and given we have well-established rules or at least guidelines for how to cite different resource types, we should be able to determine which contributors should be part of the citation.

I'm not sure what kind of resource Lars' simplified metadata example is meant to illustrate. If we for a moment assume it's about traditional (printed) research result publishing and that Creator = Author, I cannot see why Editor and Creator should be cited for one and the same resource. Let's say it's an anthology edited by Ted. Why should Lars, who is the author of one of the chapters/parts of the anthology be cited when referring to the whole anthology? I you go the other way round, that is you want to cite the chapter within the anthology, the editor of the anthology should be cited by adding Related Identifiers and specifying the RelationType as IsPartOf (see the discussion thread in the PID Forum mentioned above).

6. To summarize: Once fully-fledged citation rules are in place for research data, and DataCite provides the necessary ResourceTypes for research data, I don't see any need to duplicate the (Contributor)Type from the Contributor field to the Creator field. But we need more granular values for ContributorType, especially - as suggested by Ted - Author should be added. If you prefer a more coherent approach (which though may disturb the integration with other systems), I support Ted's suggestion: "Deprecate the creator object and add ‘creator’ or (better) ‘author’ to the contributorType list."

Best regards,
Philipp

ORCID: https://orcid.org/0000-0002-6754-7911
DataverseNO | https://dataverse.no/
The Tromsø Repository of Language and Linguistics (TROLLing) | https://trolling.uit.no/
UiT The Arctic University of Norway | https://en.uit.no/
Reply all
Reply to author
Forward
0 new messages