Hi all
I summarized some key decisions related to open license selection in the numerical sciences in the following flow chart (release 01):
This version of the diagram has a
filtered selection based on my preferences (there is
also an unfiltered version.)
You'll note I favor CC‑BY‑4.0 for datasets because tracking provenance is vital and I don't subscribe to the doctrine of convenience over all else.
Discussion and offlist feedback
welcome. I have an annotated version too, with citations for
the various claims made. And I'll probably blog that in due
course if this diagram is sufficiently well received.
cheers, Robbie
-- Robbie Morrison Address: Schillerstrasse 85, 10627 Berlin, Germany Phone: +49.30.612-87617
--
You received this message because you are subscribed to the Google Groups "openmod initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openmod-initiat...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/7fe29a94-176f-6d5c-4a3d-946f3093aff8%40posteo.de.
Hi François
The following analysis relates to energy sector data and related domains. What I write could be completely misplaced for other disciplines.
First I should reinforce that the license selection diagram is for licensing new works. And that compatibility with other licenses is important. Hence, as shown in the chart below, CC0‑1.0 is inbound compatible to CC‑BY‑4.0, but that CC‑BY‑4.0 is quite possibly not inbound compatible to ODbL‑1.0.
My remark on “convenience” was in relation to the Creative Commons CC0‑1.0 — the public domain dedication that does not require attribution tracking. On reflection, possibly not the best term to have chosen. But many analysts and scientists favor public domain because it removes the need to manage legal metadata (National Research Council 2004). Indeed my straw polls at a couple of openmod workshops suggest that CC0‑1.0 is preferred over any other data‑capable license within this community.
Many argue that CC‑BY‑4.0 creates an “attribution stacking” problem — to the point that managing the legal metadata involved becomes intractable. Those promoting this view include Mozilla (ongoing) and Wikimedia Deutschland (in meetings). My view is that the attribution stack should ideally never exceed five deep — and that it is much better practice to push corrections back upstream to community data portals for curation and reuse than to continue to combine and fork datasets ad infinitum. Pollock (2009) also thinks that the stacking problem is overstated.
The only share‑alike data‑capable license with any level of deployment is the ODbL‑1.0. This license is currently used on OpenStreetMap (as you correctly allude to). The idea of share‑alike licensing is that the material so licensed stays forever within the information commons.
But there are major problems with the share‑alike licensing of data, principally: data siloing and legal interpretation.
Lämmerhirt (2017) recommends that the ODbL‑1.0 be abandoned because its use creates data silos. And Lämmerhirt works for Open Knowledge Foundation (OKFN) who originally drafted the ODbL‑1.0.
Open source lawyer Heather Meeker (2017:259) writes unfavorably about the ODbL‑1.0 in legal terms: “[L]icensees find it difficult to to distinguish a derivative database from a new and separate database or a collective database. This kind of problem is endemic to any copyleft license, but it is particularly difficult for licenses without broad adoption. Unlike the GPL [software license], for which industry practice as to its scope was ironed out over many years, the ODbL is not widely used.”
OpenStreetMap (OSM ongoing) now offers new registrants the choice of potentially dual licensing their contributions under CC0‑1.0. So I would guess that is in response to the issues being discussed here.
Philosophically I am (like you) drawn to licenses which place and keep works in the information commons — but in practice for data, this strategy creates more problems than it resolves.
Hirth (2020) reviews data licensing for electricity sector analysis and concludes that the choice is solely between CC0‑1.0 and CC‑BY‑4.0.
Australian National Data Service (2017) recommends CC‑BY‑4.0 for licensing data, even if the copyrightability of the dataset is questionable.
European Commission (2014) favors CC0‑1.0 and CC‑BY‑4.0 and fails to mention ODbL‑1.0 in its report.
I once tried to
depict data‑capable open license interoperability on a
compatibility graph. The link from CC‑BY‑4.0 to ODbL‑1.0 remains
uncertain. Poole (2017) reports that OpenStreetMap have adopted
the licensing policy, based on prudence, that this linkage is
not sufficiently certain to rely upon. Here's the diagram:
So for all the above reasons, I did not include the ODbL‑1.0 as a choice on the open license selection flowchart.
Here is the European Commission view again:
I appreciate the feedback. Further thoughts anyone? Robbie
References and reading
Australian National Data Service (4 January 2017). Copyright, data and licensing. Melbourne, Australia: Australian National Data Service (ANDS).
Ball, Alex (17 July 2014). How to license research data. Edinburgh, United Kingdom: Digital Curation Centre (DCC).
Creative Commons (ongoing). What is the difference between the Open Data Commons licenses and the CC 4.0 licenses? — Creative Commons FAQ entry. Creative Commons. Mountain View, California, USA.
European Commission (24 July 2014). “Commission notice: guidelines on recommended standard licences, datasets and charging for the reuse of documents”. Official Journal of the European Union. C 240: 1–10.
Hirth, Lion (1 January 2020). “Open data for electricity modeling: legal aspects”. Energy Strategy Reviews. 27: 100433. ISSN 2211-467X. doi:10.1016/j.esr.2019.100433.
Lämmerhirt, Danny (December 2017). Avoiding data use silos: how governments can simplify the open licensing landscape. Open Knowledge International. Cambridge, United Kingdom.
Meeker, Heather (4 April 2017). Open (source) for business: a practical guide to open source software licensing (2nd edition). North Charleston, South Carolina, USA: CreateSpace Independent Publishing Platform. ISBN 978-154473764-5.
Mozilla (ongoing). License stacking. Mozilla Science Lab’s open data primers.
National Research Council (2004). Open access and the public domain in digital data and information for science — Proceedings of an International Symposium. Washington DC, USA: The National Academies Press. ISBN 978-0-309-09145-9. doi:10.17226/11030.
OSM (ongoing). Licence and Legal FAQ / Why would I want my contributions to be public domain. OpenStreetMap Foundation. Cambridge, United Kingdom. Access date 31 October 2019.
Poblet, Marta, Amir Aryani, Paolo Manghi, Kathryn Unsworth, Jingbo Wang, Brigitte Hausstein, Sunje Dallmeier-Tiessen, Claus-Peter Klas, Pompeu Casanovas, and Victor Rodriguez-Doncel (September 2016). Assigning creative commons licenses to research metadata: issues and cases — Preprint.
Pollock, Rufus (9 February 2009). Comments on the Science Commons protocol for implementing open access data. Open Knowledge International Blog.
Poole, Simon (17 March 2017). Use of CC BY 4.0 licensed data in OpenStreetMap. OpenStreetMap Blog.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/CAG0ygLfEQTKoW3AqRdQJ60HOTuMSN1vzgm_Xj6JPt1w2YgzP6A%40mail.gmail.com.
But many analysts and scientists favor public domain because it removes the need to manage legal metadata (National Research Council 2004). Indeed my straw polls at a couple of openmod workshops suggest that CC0‑1.0 is preferred over any other data‑capable license within this community
Many argue that CC‑BY‑4.0 creates an “attribution stacking” problem — to the point that managing the legal metadata involved becomes intractable. Those promoting this view include Mozilla (ongoing) and Wikimedia Deutschland (in meetings). My view is that the attribution stack should ideally never exceed five deep — and that it is much better practice to push corrections back upstream to community data portals for curation and reuse than to continue to combine and fork datasets ad infinitum. Pollock (2009) also thinks that the stacking problem is overstated.
The only share‑alike data‑capable license with any level of deployment is the ODbL‑1.0. This license is currently used on OpenStreetMap (as you correctly allude to). The idea of share‑alike licensing is that the material so licensed stays forever within the information commons.
But there are major problems with the share‑alike licensing of data, principally: data siloing and legal interpretation.
Lämmerhirt (2017) recommends that the ODbL‑1.0 be abandoned because its use creates data silos. And Lämmerhirt works for Open Knowledge Foundation (OKFN) who originally drafted the ODbL‑1.0.
Open source lawyer Heather Meeker (2017:259) writes unfavorably about the ODbL‑1.0 in legal terms: “[L]icensees find it difficult to to distinguish a derivative database from a new and separate database or a collective database. This kind of problem is endemic to any copyleft license, but it is particularly difficult for licenses without broad adoption. Unlike the GPL [software license], for which industry practice as to its scope was ironed out over many years, the ODbL is not widely used.”
OpenStreetMap (OSM ongoing) now offers new registrants the choice of potentially dual licensing their contributions under CC0‑1.0. So I would guess that is in response to the issues being discussed here.
Philosophically I am (like you) drawn to licenses which place and keep works in the information commons — but in practice for data, this strategy creates more problems than it resolves.
Hi François
Thanks too for your very considered responses.
I didn’t snip material because I felt it more useful to keep the debate together.
Let me first underscore that I would prefer a data commons protected under share‑alike licensing. That practice has, to a significant degree, been the case with software and the GPL family of licenses. Indeed it was the GPL‑2.0 that drove free and open source software and the Linux kernel would have doubtless been another interesting but long abandoned project without that exact license (Casad 2017). Torvalds has described relicensing Linux under the GPL‑2.0 as the “best thing I ever did” (quoted on Wikipedia). And my license selection flowchart (the current version) describes software copyleft licensing thus “prevent proprietary capture, maximize inbound reuse opportunities, and/or encourage contributors”.
But data and software differ markedly. Software projects can, more or less, exist stand‑alone and code in isolation can be reliably assessed for quality. Data is the exact opposite. Mixable data is key and provenience is by far the best determinant of quality.
Probably if the ODbL‑1.0 had become dominant, we wouldn’t be having this discussion. But it didn’t and we need to look closely at second‑best solutions. Reichman and Okediji (2012) describe the scientific anti‑commons that both you and I would wish to avoid.
Hi Robbie and thank you for the work you did to classify licensesWe're not used to read this so often so it's great, especially with so well sourced and references given in each of you mails.
I have some additional points to make here, see below :
Le dim. 8 mars 2020 à 23:38, Robbie Morrison <robbie....@posteo.de> a écrit :
But many analysts and scientists favor public domain because it removes the need to manage legal metadata (National Research Council 2004). Indeed my straw polls at a couple of openmod workshops suggest that CC0‑1.0 is preferred over any other data‑capable license within this community
As a public data contributor who spend now months outside to collect information that even companies are not able to manage in their official repositories, I'm not happy to understand this analysts and scientists don't accept to credit such contributions in their work.I may have misunderstood your statement (I hope so)How analysts and scientists can't bear attribution work as the legit retribution of free and more and more often qualified data you won't find in paid repositories?
There is an interesting relationship between legal attribution and scientific acknowledgment: the former being required by legal instruments and the latter by social convention. Fisk (2006) covers the issues and argues for a new “right to attribution” in common law countries (like the US), somewhat analogous to the moral right in civil law jurisdictions (like France).
My view is that, seeing acknowledgment is required by science, that use of the CC‑BY‑4.0 attribution license should not add to the overhead required by good scientific practice. From another angle, technical metadata should always be present and so tracking legal attribution should not involve additional tooling nor much effort.
So I personally support
CC‑BY‑4.0 over CC0‑1.0. But that is a minority view in science
and in this community. Moreover it is up to data providers to
determine under which legal conditions they release data. Which
is why both options are shown on the flowchart.
Many argue that CC‑BY‑4.0 creates an “attribution stacking” problem — to the point that managing the legal metadata involved becomes intractable. Those promoting this view include Mozilla (ongoing) and Wikimedia Deutschland (in meetings). My view is that the attribution stack should ideally never exceed five deep — and that it is much better practice to push corrections back upstream to community data portals for curation and reuse than to continue to combine and fork datasets ad infinitum. Pollock (2009) also thinks that the stacking problem is overstated.
The only share‑alike data‑capable license with any level of deployment is the ODbL‑1.0. This license is currently used on OpenStreetMap (as you correctly allude to). The idea of share‑alike licensing is that the material so licensed stays forever within the information commons.
But there are major problems with the share‑alike licensing of data, principally: data siloing and legal interpretation.
Lämmerhirt (2017) recommends that the ODbL‑1.0 be abandoned because its use creates data silos. And Lämmerhirt works for Open Knowledge Foundation (OKFN) who originally drafted the ODbL‑1.0.
Share-alike can be relevant when you expect private sector to contribute as it doesn't have difficulties to consume public data. This is a real issue currently, especially with public-private cooperation contracts.
That requires that this material in fact attracts copyright. I
don't believe that is very often the case. (My next project is to
collect legal references as to why copyright authorship applies
solely and exclusively to humans.) One can add a license, but if
there is no underlying copyright or database protection, that
license is legally meaningless. Recall too that database
protection only applies to substantial reuse (technically
"extractions"). The same applies to CC‑BY‑4.0 of course, but that
license is not designed to prevent proprietary capture.
So you need to be more specific about what kind of data is under discussion.
Silos are sometimes here for good reasons: they may hold data that doesn't have to be shared in the same conditions (let's say, personal records). It's a hygiene question more than simpler attribution considerations.
We have situations in France where companies accept to share their own data under ODbL terms and then take advantage of OSM data in their internal systems without any silos (SNCF, French rail operator for instance).
Let's take personal data off the table. Individual privacy in
Europe is regulated under the GDPR and one cannot contract away
rights en masse, as an open license intentionally does.
I’ve not encountered the “data hygiene” argument before. Normally open data advocates argue for maximum remixing. So what you are saying is that silos have merit. That idea is occasionally discussed for software, sometimes under the rubric of “balkanization” and mostly as a criticism to the GPL‑3.0 license when it was published in 2007.
I guess my fundamental
question is what is it that you are trying to protect? If it is
volunteer effort, then I would say that that should not outweigh
reusability. If it is data provenance, then CC‑BY‑4.0 should be
sufficient. If it is community curation, then the infrastructure,
community activity, and reputation should be necessary and
sufficient. Or put conversely, the ODbL‑1.0 offers no further
guarantees than CC‑BY‑4.0 in regard to the quality of community
curation. The fidelity of information on OpenStreetMap and
Wikipedia, for instance, are products of their respective
communities and not their share‑alike licensing, I would argue.
I could contradict myself by observing that the share‑alike GPL‑2.0 kept the Linux kernel project focused and coherent. But equally, when so much current data is either public domain, CC0‑1.0, or CC‑BY‑4.0 and remixing is vital for utility, do data silos and data hygiene really make much sense?
Open source lawyer Heather Meeker (2017:259) writes unfavorably about the ODbL‑1.0 in legal terms: “[L]icensees find it difficult to to distinguish a derivative database from a new and separate database or a collective database. This kind of problem is endemic to any copyleft license, but it is particularly difficult for licenses without broad adoption. Unlike the GPL [software license], for which industry practice as to its scope was ironed out over many years, the ODbL is not widely used.”
OpenStreetMap (OSM ongoing) now offers new registrants the choice of potentially dual licensing their contributions under CC0‑1.0. So I would guess that is in response to the issues being discussed here.
I've never been told of such possibility and that's not what is promoted here
That option was present when I registered for OpenStreetMap on 6 February 2019. Also as per this screenshot taken today (the full screen PNG attached as well), note the final tickbox which reads "In addition to the above, I consider my contributions to be in the Public Domain":
In passing, some of what OSM writes on copyright and database protection law is incorrect. I’ve requested edits but the problematic text never seems to get modified.
I found the reference of OSM Foundation FAQ stating that individual contributors may want their contributions PD, but this won't affect licensing of data extracts out of OSM.Currently the only supported license by the OSMf is ODbL-1.0, no exceptions.
Unless someone writes a fine‑grain filter for license attributes and exports PD material on that basis. And I don't see any technical reason why not. Or am I missing something?
Philosophically I am (like you) drawn to licenses which place and keep works in the information commons — but in practice for data, this strategy creates more problems than it resolves.
Philosophically I think actors who bring problems prior solutions need to finish their change transitions first :)Share data and propagation/silos problems will vanish themselves.
I agree that citizen‑generated data is a special case, relative to data from public bodies and from scientific projects. Lämmerhirt et al (2018) don’t cover licensing, except to remark that Statistics Canada (StatsCan) once reissued some of their data under ODbL‑1.0 compatible terms. Conversely Anon (2018) does cover licensing, but defers to Lämmerhirt (2017). And, as noted previously, Lämmerhirt explicitly advises against ODbL‑1.0.
In passing, the license selection flowchart arose from informal inputs by me to a project Mark Howells is coordinating to develop good practice for “national energy system analytics standards” with an emphasis on applications in the Global South.
Regarding citizen‑generated data more generally, I think we are about to see civil society undertaking its own public policy analysis using, in large part, using data collected, collated, and/or curated by civil society. I am fully supportive of that agenda. On such project is Drawdown (Hawkins 2017).
I appreciate this conversation, all the best
I’ll repost the license selection flowchart in due course. Ludwig Hülk is currently studying it too and I expect some robust feedback from him in shortly. :)
Thanks for engaging. Robbie
References and reading
Anonymous (December 2018). Choosing and engaging with citizen generated data. Global Partnership for Sustainable Development Data, Open Knowledge International, Public Data Lab.
Casad, Joe (2017). “The story of the GPL”/eng-US). Linux Magazine. (200). ISSN 1471-5678.
Fisk, Catherine L (2006). “Credit where it’s due: the law and norms of attribution”. Georgetown Law Journal. 95: 49–117. ISSN 0016-8092.
Hawkins, Paul (editor) (18 April 2017). Drawdown: the most comprehensive plan ever proposed to reverse global warming. New York, USA: Penguin Books. ISBN 978-014313044-4.
Lämmerhirt, Danny, Jonathan Gray, Tommaso Venturini, and Axel Meunier (December 2018). Advancing sustainability together? citizen-generated data and the Sustainable Development Goals. Global Partnership for Sustainable Development Data, Open Knowledge International, Public Data Lab.
Reichman, Jerome H and Ruth Okediji (2012). “When copyright law and science collide: empowering digitally integrated research methods on a global scale”. Minnesota Law Review. 96: 1362–1480. ISSN 0026-5535.
François LacombeOSM France
--
You received this message because you are subscribed to the Google Groups "openmod initiative" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openmod-initiat...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/CAG0ygLfKRh1KP3oo7mdt63tTYWrAsX3k%2BhA1XC6LhFC%2BOJqPqA%40mail.gmail.com.
Hello all
I had further interactions offlist. As is often the case, there are two mutually exclusive options regarding open data licensing:
Under current legal
understanding, both strategies create their own rock‑solid
information silos. The requires that datasets under each license
remain fully separate on distribution. That requirement does not
prevent one from tarring datasets together and
distributing the tarball. But any tighter integration could
potentially violate copyright and, Europe only, database
protection. Tighter integration would include adding
disparately‑licensed information to the same relational
database.
What you do with
legally incompatible data locally is generally considered your
business. The open license obligations trip on distribution.
How this all works in the cloud is uncharted territory. While
noting that provisions are now written into both law and newer
open licenses, including CC‑BY‑4.0, to support some forms of
internet data mining.
Secondary
considerations apply. ODbL‑1.0 is adopted by OpenStreetMap and
that platform is widely used for all manner of citizen‑generated
geographic‑related information, including power system assets.
Conversely, commercial organizations have repeatedly stated that
they will not license their data under ODbL‑1.0. Much of what
energy analysts rely on classes as “privately held data [of]
public interest” to quote the European Commission.
Philosophical views
apply. Some believe the the information commons should not be
subject to enclosure: a reference to the legal process
whereby smallholdings in Britain were co‑opted by wealthy
families with the help of favorable legislation. Some believe
the digital commons should be entirely non‑commercial — such as
the authors of recent Green New Deal for Europe (DEM25 2019:46)
— thereby rending their particular view of the commons non‑open.
Some believe that data should be as legally unencumbered as
possible. Some claim solely to be pragmatists.
I have decided to adopt the wikipedia line on substantiation. Thus far, I have not located a reliable secondary source that recommends ODbL‑1.0 for energy sector data. Indeed most authors take the opposite view, and some explicitly so. Therefore, if there is a reliable secondary source I can cite on this matter, I will add the ODbL‑1.0 as an option. Unless and until then, I will leave ODbL‑1.0 off as a potential choice.
with best wishes, Robbie
References
DEM25 (editor) (December 2019). The green new deal for Europe: blueprint for Europe’s just transition (2nd ed). Europe: Democracy in Europe Movement 2025 (DEM25).
To view this discussion on the web, visit https://groups.google.com/d/msgid/openmod-initiative/9b5050b9-6441-560f-99e9-3dcb426d6cef%40posteo.de.