metadata copyright/use precedents & FAQ, 2011 is the year

23 views
Skip to first unread message

Jon Voss

unread,
Dec 23, 2010, 2:01:01 PM12/23/10
to lod...@googlegroups.com
MacKenzie Smith and I have been going back and forth a bit on copyright/use issues around metadata and I wanted to open up the conversation to get others input on this as well.

To set the stage, Civil War Data 150 is beginning to collect metadata from about half a dozen state and federal institutions in the US.  We're also piloting a metadata contribution process on LookBackMaps and will be giving institutions a choice for how they license their metadata.  The key here is that we want to encourage OPEN metadata contributions, and it seems best for us, at least in the US, to use Creative Commons licenses.  For our purposes, I would suggest that anything CC-BY or less (ie. PD, CC0 or CC-BY) can be considered Open Data.  Anything more is not open as it can not be used in commercial applications, or Freebase or Wikipedia.

Note that we are differentiating between metadata and actual assets or digital surrogates like photos, etc.

The key is that in the coming year, the precedents are going to be set in libraries, archives, and museums, so how we take these first steps are going to be critical. 

So we'd love to put together an FAQ on publishing metadata and licensing and use policies.  My hope is that this can eventually inform data management requirements for publicly funded projects.  For instance, I just heard from a great NEH funded project that is considering how to share their data/metadata.  They're thinking it would be great to use CC-BY-NC-SA.  To me, this illustrates the importance of this issue this year, and the educating and evangelizing we have in front of us.

I'd love any thoughts on this... it may make sense to set up some wiki space to work on an FAQ in a collaborative way?  Has anyone else put anything like this together?

Thanks, Jon


Jon Voss
Twitter: LookBackMaps
                 +

Cornelius Puschmann

unread,
Dec 23, 2010, 4:09:57 PM12/23/10
to lod...@googlegroups.com
Hey Jon,

thanks for the interesting updates. Let me chime in on the licensing aspect:


For instance, I just heard from a great NEH funded project that is considering how to share
their data/metadata.  They're thinking it would be great to use CC-BY-NC-SA.


CC-BY-NC-SA is too restrictive for (meta)data in virtually any scenario that is to include re-use. See http://blogs.talis.com/nodalities/2010/02/sharing-data-on-the-web.php. This white paper from (neuro/science/creative)commons, while on primary data, might also be useful: http://neurocommons.org/report/data-publication.pdf. Consensus from people at CC is that PD/CC0 is superior to more restrictive solutions, at least for research data.

People at the Open Knowledge Foundation, specifically in the Open Bibliographic Data Working Group (http://wiki.okfn.org/wg/bibliography) are also looking into these kinds of issues. In their priciples on open bibliographic data (http://openbiblio.net/2010/10/15/principles-for-open-bibliographic-data/), the working group comes to the conclusion that

Many widely recognized licenses are not intended for, and are not appropriate for, metadata or collections of metadata. A variety of waivers and licenses that are designed for and appropriate for the treatment of are described here. Creative Commons licenses (apart from CC0), GFDL, GPL, BSD, etc. are NOT appropriate for data and their use is STRONGLY discouraged. Use a recognized waiver or license that is appropriate for metadata.

Anyhow, I'm sure it would be possible to get someone from OKFN or CC (or both) on board for the summit.

Cheers and Happy Holidays,

Cornelius
--
Dr. Cornelius Puschmann, M.A.

Department for English Language and Linguistics
Heinrich-Heine-Universität Düsseldorf
Building 23.11, Level 1, Room 21
Universitätsstrasse 1
40225 Düsseldorf
Germany

Nachwuchsforschergruppe "Wissenschaft und Internet" /
Junior Researchers Group "Science and the Internet"

Jodi Schneider

unread,
Dec 24, 2010, 8:16:45 AM12/24/10
to lod...@googlegroups.com
I concur: Data should be released as Public Domain (aka CC0). -Jodi

Perian Sully

unread,
Dec 24, 2010, 1:11:44 PM12/24/10
to lod...@googlegroups.com
Someone please correct me if I'm wrong, but I thought that data wasn't copyrightable?

Jon Voss

unread,
Dec 24, 2010, 5:20:05 PM12/24/10
to lod...@googlegroups.com

Cornelius, these are great links and I’ll do some reading up.


Perian, some definitely argue that, but there’s a lot of grey area to say the least.  Also, if our intention is ultimately to create a context in which libraries, archives, and museums can publish Linked and/or Open Data, we want to be able to convince them to do so in a way that makes sense to them from a policy and strategy perspective, while also providing appropriate tools.  We may use SEO arguments, or highlight other advantages to using CC0, and show that using CC-BY-NC-SA is not only inappropriate for metadata (though theoretically it may aid in sharing for some traditional research purposes), but would also eliminate other advantages of sharing metadata in with a more open license.

 

Jon

MacKenzie Smith

unread,
Dec 24, 2010, 8:37:43 PM12/24/10
to lod...@googlegroups.com

Just a couple of points to clarify why this is a more complicated that we like:

 

You’re right that facts are not copyrightable in the U.S., but not all (meta)data is strictly factual. Lots of LAM metadata is copyrighted – for example, archival finding aids and analytics in cataloging records. If intellectual effort went into it, rights accrue.

There are explicit (meta)data rights and database rights in many countries, and ideally we want LAM metadata to be interoperable irrespective of the legal regime it came from.

 

So I completely agree that CC0 is the best solution for LAM metadata to insure global interoperability and accessibility. But there are institutions who’ve invested a lot in their (meta)data and want credit for it, so CC-BY may be the most realistic suggestion there. And some non-profit organizations also feel very strongly about the NC clause, for better or worse.  The-SA clause is the one we really should steer clear of, if we want interoperability (and ODbL has some of the same problems).

 

Cheers,

 

MacKenzie Smith

Associate Director, MIT Libraries

Research Fellow, Creative Commons

Jon Voss

unread,
Dec 27, 2010, 2:51:36 PM12/27/10
to lod...@googlegroups.com
Another hat tip to Jerry Persons on this one.  Great paper on reuse in Cultural Institutions from last month...  from my quick read through it, I don't think this work differentiates between metadata and digital assets though.  I think that is a key distinction we can make with Linked Open Data.  To publish the metadata with an open license still enables you to put restrictions on use of a digital asset.  I.e. we want you to know about this photograph, but we don't allow you to use the high resolution version in certain ways.  So your metadata could be published as CC0, but the holding themselves may be CC-BY-NC-SA, at least in theory.

from the abstract:
This paper argues that CI should develop a multiplicity of access and use regulations that 
acknowledge the varying sensitivity of collections and the varying level of risk associated 
with different types of reuses. It concludes by offering a set of examples of collections 
employing varying levels of reuse control (from none to complete) to serve as heuristics.

http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3060/2640
Reply all
Reply to author
Forward
0 new messages