Dataset DDI metadata export should include license for the metadata

Skip to first unread message

Feb 18, 2018, 6:46:23 PM2/18/18
to Dataverse Users Community
The DDI standard that Dataverse uses to export its metadata has a section, docDscr, for describing the metadata document itself, including a copyright element for letting others know how the metadata can be used. I think Dataverse should use this element to add a license to the DDI export, which would help others looking to reuse that metadata know how they can reuse it. ADA and ICPSR export their DDI metadata this way now.

I'd like to get some feedback about this.

Some questions I have:
- Does the Harvard Dataverse Terms of Use say anything that would help determine what license would be appropriate for the publicly available DDI metadata? Perhaps its "Licenses and Permissions to Harvard Dataverse" section?

- Would it be appropriate to always apply the same license to the variable metadata (in the fileDscr section)?

Here's DDI 2.5 structure with where the metadata document license would go:

                <copyright></copyright> -- copyright of metadata document
    <fileDscr ID="">

Julian Gautier

Aug 9, 2019, 3:03:25 PM8/9/19
to Dataverse Users Community
One of the FAIR principles is that "(meta)data are released with a clear and accessible data usage license" (R1.1.). To answer my own year-old question in this thread, Harvard Dataverse's Terms of Use states what rights the user gives the repository regarding use of metadata, including the right to "incorporate Metadata or documentation in the Content into public access catalogues." That includes all metadata that Harvard Dataverse exports. We assume this right in the design of Dataverse's APIs and OAI-PMH, which expose user metadata. This feels like effectively, but implicitly, the metadata that any Dataverse repository creates is in the public domain. So I wonder if it would be clearer if in Harvard Dataverse's Terms of Use, it's stated that its metadata has a CC0 waiver. And it should be expressed somewhere explicitly that by default any metadata that Dataverse repositories export have CC0 waivers.

A repository's Terms of Use is human readable (hopefully), but those following the FAIR principles should try to do so in machine readable ways, too, and most (if not all) repository Terms of Use documents are not machine readable, including Harvard Dataverse's, so its not easy for machines to figure out how the metadata can be used.

One way to express the usage rights of metadata in a machine readable way is in the structured metadata document of each dataset. I described how to do this in the version of DDI that Dataverse exports for published datasets. I'm sure it's easy to do in the "Dataverse_json" export. Some of the other metadata standards might not include machine readable ways to distinguish between the usage rights of the metadata and data that it describes.

Or maybe there's a way to include usage rights of the metadata exports in a machine readable document that describes the repository as a whole?

Last, I should say that I don't know of any Dataverse repository that needs to restrict the use of its metadata, and the principles of openness are so baked into its development that I can't see how the issue of metadata usage rights can ever be as contentious as it's been in other efforts to aggregate metadata to improve the discoverability of information.

Crosas, Mercè

Aug 9, 2019, 4:06:07 PM8/9/19

The intention of DataTags is to partially help address this issue and attach an agreement/terms or license with each DataTag. Ideally, the agreement or terms would be standardized with a few attributes or fields that would describe it. 

The dataset metadata that needs to be protected is the summary statistics generated when tabular files are ingested. 

You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit

Mercè Crosas, Ph.D.
Harvard University's Research Data Officer,  HUIT
Chief Data Science and Technology Officer, IQSS

Julian Gautier

Aug 27, 2020, 12:56:56 PM8/27/20
to Dataverse Users Community
A group working on updates to DDI Codebook have proposed in their issue tracking system ( a solution for more clearly indicating, in the metadata document itself, the license of the entire DDI Codebook metadata document. Someone brought up the point that the way ICPSR does this currently (and what I proposed in this thread's original post) doesn't always make sense. The group concluded that there's no good way of doing this in the current version of DDI Codebook, so in the upcoming version they're discussing introducing one or more new elements.

I think other folks in the Dataverse community who are interested should review the proposed changes. You might be able to leave comments on that ticket, or join the mailing list at, or email me.

Regarding protecting summary statistics, I think Codebook's current method of indicating the licenses and terms of the summary-statistics and variable-level metadata will still work with the changes that the DDI Codebook update group is proposing, but I imagine that'll need to be revisited with the DataTags work.
Reply all
Reply to author
0 new messages