Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#885698: Update and document criteria for inclusion in /usr/share/common-licenses

1 view
Skip to first unread message

Sean Whitton

unread,
Dec 29, 2017, 5:00:02 AM12/29/17
to
Package: debian-policy
Severity: important
X-debbugs-cc: a...@debian.org
Control: block 795402 by -1
Control: block 883966 by -1
Control: block 884223 by -1
Control: block 884226 by -1
Control: block 884227 by -1
Control: block 884228 by -1
User: debian...@packages.debian.org
Usertags: normative discussion

Hello debian...@l.d.o,

Our current criteria for including licenses, as Markus Koschany smartly
puts it in #884228, is "[a]pparently something between gut feeling and
the popularity of our least preferred license in common-licenses." We
can and should do better than this.

In the air is also the idea that we include licenses in common-licenses
to save disk space on low disk space systems: the license should be
popular enough such that the reduced size of d/copyright files will
outweigh the increased size of base-files.

We should write down our criteria in Policy, in section 12.5 (or
possibly in the Policy Changes Process appendix). We should probably
say too that the application of the criteria is at the discretion of the
Policy Editors. Before we can do that, however, we need to consider
whether the criteria need to be updated.

The only point of clear consensus -- at least among the Policy Editors
-- is that short licenses which have more than one popular variant
should never be included because of the risk that packages licensed
under one variant incorrectly refer to a different variant in
common-licenses. This problem actually exists in the archive because a
BSD variant was included in common-licenses at some point. We should
include this point the Policy Manual.

Otherwise, here are some of the arguments on the table:

(1) In a related d-devel thread, someone working with embedded systems
suggested that these days, either a system has enough disk space that
common-licenses isn't relevant, or it has so little disk space that all
of /usr/share/doc must be deleted. If this is right, disk space
concerns should not decide what goes into common-licenses. Is it right?

(2) Some people want more licenses in common-licenses because they find
it more convenient. Convenient processes save our volunteers' time. We
frequently get requests to expand common-licenses and I suspect that
many of them are motivated by the belief that it would make the
requestor's work more convenient. If disk space issues aren't relevant
anymore, an increase in convenience might become a dominating criterion
for inclusion. However, this point has been disputed: better tools
could provide license text formatted suitably for d/copyright, which
would be just as convenient (e.g., in Emacs: `C-u M-!
get-formatted-license GPL-3` would be about as convenient as it gets).
And there surely exist those who find common-licenses makes editing
d/copyright less convenient...

I'm not sure how to proceed. It would be nice to verify (1) with other
people working with embedded systems. Possibly we should ask on one of
our more specialised mailing lists. And there are surely other
arguments besides (1) and (2). We should gather those in this bug.

#884228 has further points of discussion, but I'd ask that we restrict
ourselves in this bug to discussing what the criteria for inclusion
should be. In particular, let's not discuss the proposal to add all
known DFSG-free licenses to common-licenses. Whether that proposal is
valid depends on our criteria for inclusion, so let's stick to hashing
our those criteria in this bug.

--
Sean Whitton
signature.asc

Paul Hardy

unread,
Oct 17, 2018, 11:10:02 PM10/17/18
to
Control: block 910548 by -1

Blocking my own bug report with this one, which I just noticed.

I submitted bug #910548 previously against the base-files package:
"base-files - please consider adding
/usr/share/common-licenses/Unicode-Data".

I had formatted the copyright and license information for Unicode data
files from the http://unicode.org website to put in the
debian/copyright file in a package that I created this summer. The
copyright information is more involved than most copyright statements,
so I kept it in what I submitted with the bug report.

I thought if that license was something Debian found useful, there
would be no need for anyone else to duplicate the effort of formatting
that I had gone through once, and so I offered it. Just the license
in isolation could be formatted like other licenses fairly quickly if
the copyright section is not wanted. Or the whole thing can be left
out and that bug closed, as you wish.

Thanks,


Paul Hardy

Bill Allombert

unread,
Sep 10, 2023, 12:40:05 PM9/10/23
to
On Sun, Sep 10, 2023 at 09:00:22AM -0700, Russ Allbery wrote:
> Jonas Smedegaard <jo...@jones.dk> writes:
> > Quoting Hideki Yamane (2023-09-10 11:00:07)
>
> >> Hmm, how about providing license-common package and that depends on
> >> "license-common-list", and ISO image provides both, then? It would be
> >> no regressions.
>
> I do wonder why we've never done this. Does anyone know? common-licenses
> is in an essential package so it doesn't require a dependency and is
> always present, and we've leaned on that in the past in justifying not
> including those licenses in the binary packages themselves, but I'm not
> sure why a package dependency wouldn't be legally equivalent. We allow
> symlinking the /usr/share/doc directory in some cases where there is a
> dependency, so we don't strictly require every binary package have a
> copyright file.

Or we could generate DEBIAN/copyright from debian/copyright using data in
license-common-list at build time. So maintainers would not need to manage the copying
themselves.

Cheers,
Bill

Johannes Schauer Marin Rodrigues

unread,
Sep 10, 2023, 4:00:03 PM9/10/23
to
Hi,

Quoting Bill Allombert (2023-09-10 18:29:36)
I very much like this idea. The main reason maintainers want more licenses in
/usr/share/common-licenses/ is so that they do not anymore have humongous
d/copyright files with all license texts copypasted over and over again. If
long texts could be reduced to a reference that get expanded by a machine it
would make debian/copyright look much nicer and would make it easier to
maintain while at the same time shipping the full license text in the binary
package.

Does anybody know why such an approach would be a bad idea?

I have zero legal training so the only potential problem with this approach
that I was able to come up with is, that then the source package itself would
not anymore contain the license text and thus we would be shipping code covered
by a license that states that the code may only be distributed with the license
text alongside it without that text. So while auto-generating this would
probably create compliant binary packages, it would leave the source package
without the license text. Is that a problem?

Thanks!

cheers, josch
signature.asc

G. Branden Robinson

unread,
Sep 10, 2023, 4:30:04 PM9/10/23
to
At 2023-09-10T21:47:36+0200, Johannes Schauer Marin Rodrigues wrote:
> Quoting Bill Allombert (2023-09-10 18:29:36)
> > On Sun, Sep 10, 2023 at 09:00:22AM -0700, Russ Allbery wrote:
> > > Jonas Smedegaard <jo...@jones.dk> writes:
> > > >> Hmm, how about providing license-common package and that
> > > >> depends on "license-common-list", and ISO image provides both,
> > > >> then? It would be no regressions.
> > >
> > > I do wonder why we've never done this. Does anyone know?
> > > common-licenses is in an essential package so it doesn't require a
> > > dependency and is always present, and we've leaned on that in the
> > > past in justifying not including those licenses in the binary
> > > packages themselves, but I'm not sure why a package dependency
> > > wouldn't be legally equivalent. We allow symlinking the
> > > /usr/share/doc directory in some cases where there is a
> > > dependency, so we don't strictly require every binary package have
> > > a copyright file.
> >
> > Or we could generate DEBIAN/copyright from debian/copyright using data in
> > license-common-list at build time. So maintainers would not need to manage
> > the copying themselves.
[...]
> I have zero legal training so the only potential problem with this approach
> that I was able to come up with is, that then the source package itself would
> not anymore contain the license text

...why wouldn't it? Remember how a source package is defined:

A DSC file, an upstream source archive (maybe more than one in exciting
new source formats I haven't learned), and a compressed diff of Debian
changes.

Debian _source_ packages generally don't chop copyright notices and
license texts out the upstream distributions, and should not do so
unless those notices/texts are invalid or the material they cover has
been removed. (Both of these do sometimes happen.)

Even if one worries about theoretical liability due to the existence of
separate files for .dsc, .tar.gz, and .diff.gz, then let us recall that
(1) the DSC is minimal, containing metadata that may not rise to the
threshold or originality required by copyright [in the U.S., anyway];
(2) the upstream archive has the notices and texts that the _original
distributor_ put in it, and as a rule, if permission to distribute the
work exists, it is not incumbent on redistributors to add notices/texts
where the rightsholder themselves neglected to do so; and (3) the
.diff.gz will not be in the business of removing notices/texts except as
contemplated in the previous paragraph (correcting erroneous
notices).[1]

> and thus we would be shipping code covered by a license that states
> that the code may only be distributed with the license text alongside
> it without that text.

I don't think that is a risk as long as people continue to follow
packaging practices that Debian has applied with little objection from
our upstreams for 25+ years.[2]

> So while auto-generating this would probably create compliant binary
> packages, it would leave the source package without the license text.

I am unable to imagine the mechanism by which that would happen, given
what Russ and Bill proposed.

Regards,
Branden

[1] When repackaging, e.g., to remove non-free material, affected
content is removed altogether even from the source. Nothing in
copyright law can compel you to distribute copyright notices and
texts that don't apply to work you're not distributing.[3]

[2] I don't know of Debian _ever_ having had a problem, as in receiving
a cease-and-desist letter or other threat of legal action with what
one might term an "institutional" copyright holder. We've certainly
had our share of nasty emails from cantankerous individual copyright
holders, often who had their own perverse misreadings of licenses
drafted by others (hello to the memory of Jörg Schilling). There
also was once an upstream who stuck a Trojan horse into the source
code to try to get Debian's users to stop using versions we
distributed, but to go directly upstream instead. Nowadays, that
seems quaint; you can today Trojan your machine much more
conveniently with npm(1).

[3] At the same time a few non-free FSF manuals under the GNU FDL
declaim the GNU _GPL_ text to be an Invariant Section. Like most of
the defects of the FDL, I think this is a pointless encumbrance; if
you distribute GPL'ed software, a copy of its text must come along
anyway. The only rationale I can imagine is to mandate, for printed
copies of the manuals, the inclusion of the GPL's preachy preamble.
But I digress.
signature.asc

Russ Allbery

unread,
Sep 10, 2023, 4:40:03 PM9/10/23
to
Johannes Schauer Marin Rodrigues <jo...@debian.org> writes:

> I very much like this idea. The main reason maintainers want more
> licenses in /usr/share/common-licenses/ is so that they do not anymore
> have humongous d/copyright files with all license texts copypasted over
> and over again. If long texts could be reduced to a reference that get
> expanded by a machine it would make debian/copyright look much nicer and
> would make it easier to maintain while at the same time shipping the
> full license text in the binary package.

> Does anybody know why such an approach would be a bad idea?

I can think of a few possible problems:

* I'm not sure if we generate binary package copyright files at build time
right now, and if all of our tooling deals with this. I had thought
that we prohibited this, but it looks like it's only a Policy should and
there isn't a mention of it in the reject FAQ, so I think I was
remembering the rule for debian/control instead. Of course, even if
tools don't support this now, they could always be changed.

* If ftp-master has to review the copyright files of each binary package
separate from the copyright file of the source package (I think this
would be an implication of generating the copyright files during build
time), and the binary copyright files have fully-expanded licenses, that
sounds like kind of a pain for the ftp-master reviewers. Maybe we can
deal with this with better tooling, but someone would need to write
that.

* If we took this to its logical end point and did this with the GPL as
well, we would add 20,000 copies of the GPL to the archive and install a
*lot* of copies on the system. Admittedly text files are small and
disks are large, but this still seems a little excessive. So maybe we
still need to do something with common-licenses?

--
Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>

Jonas Smedegaard

unread,
Sep 11, 2023, 1:00:03 AM9/11/23
to
Quoting Russ Allbery (2023-09-10 23:24:24)
> Jonas Smedegaard <jo...@jones.dk> writes:
>
> > I have so far worked the most on identifying and grouping source data,
> > putting only little attention (yet - but do dream big...) towards
> > parsing and processing debian/copyright files e.g. to compare and assess
> > how well aligned the file is with the content it is supposed to cover.
>
> > So if I understand your question correctly and you are not looking for
> > the output of `licensecheck --list-licenses`, then unfortunately I have
> > nothing exciting to offer.
>
> I think that's mostly correct. I was wondering what would happen if one
> ran licensecheck debian/copyright, but unfortunately it doesn't look like
> it does anything useful. I tried it on one of my packages (remctl) that
> has a bunch of different licenses, and it just said:
>
> debian/copyright: MIT License
>
> and apparently ignored all of the other licenses present (FSFAP, FSFFULLR,
> ISC, X11, GPL-2.0-or-later with Autoconf-exception-generic, and
> GPL-3.0-or-later with Autoconf-exception-generic). It also doesn't notice
> that some of the MIT licenses are variations that contain people's names.
>
> (I still put all the Autoconf build machinery licenses in my
> debian/copyright file because of the tooling I use to manage my copyright
> file, which I also use upstream. I probably should change that, but I
> need to either switch to licensecheck or rewrite my horrible script.)
>
> Also, presumably it doesn't know about copyright-format since it wouldn't
> be expecting that in source files, so it wouldn't know to include licenses
> referenced in License stanzas without the license text included.

Right. Licensecheck so far mostly scans for human prose stating "this
has been licensed as..." and "this is the license...", and rarely is
able to recognize "the default license of this project is..." or "that
folder over there is licensed as..." style prose.

That said, there is interest in covering that as well, and also interest
in improving on non-prose forms like "[this is YAML;] Copyright: ..." or
binary forms most commonly embedded in fonts and ICC data in images.

It is helpful if you (i.e. anyone reading this) have a good (as in
particularly rich/tricky/peculiar) case that you file a bugreport
pointing to its failure of being recognized by licensecheck.

Also, I hadn't thought of there being interest in statistics - it should
not be too hard to spit out numbers for variation in licenses or
copyright holders once licensecheck has recognized the information.
Again, if someone has suggestions for formats they'd particularly like
such statistisc to be served from licensecheck then please file a
bugreport.

Sorry this isn't helping anything for the topic being discussed.


- Jonas

--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/
* Sponsorship: https://ko-fi.com/drjones

[x] quote me freely [ ] ask before reusing [ ] keep private
signature.asc

Hideki Yamane

unread,
Sep 12, 2023, 3:40:04 AM9/12/23
to
Hi,

On Sun, 10 Sep 2023 18:29:36 +0200
Bill Allombert <ball...@debian.org> wrote:
> Or we could generate DEBIAN/copyright from debian/copyright using data in
> license-common-list at build time. So maintainers would not need to manage the copying
> themselves.

One problem is, that some software declares that they use some licenses
(e.g. MIT), but sometimes they modify the license term itself a bit.
So, there's a difference between words in the license list and some words
in the included license in such software.

It'd be better to find such software and ask upstream to fix it to use
proper license terms, by tagging it at BTS. And, it's NOT Debian specific
issues, so it may be better to ask folks to join such a movement then, IMHO.


--
Hideki Yamane <hen...@iijmio-mail.jp>

Jonas Smedegaard

unread,
Sep 12, 2023, 5:00:03 AM9/12/23
to
Quoting Hideki Yamane (2023-09-12 09:27:12)
I can only assume that the proposal for an automated DEBIAN/copyright
file is limited to source files *possible* to automatically process, and
consequently only relates to debian/copyright files written in the
machine-readable format.

The problem you describe about ambiguous MIT-derived licensing cannot,
in by understanding, occur using the machine-readable format - only with
less strictly structured debian/copyright files.

If you mean to say that ambiguous MIT declarations exist in
debian/copyright files written using the machine-readable format, then
please point to an example, as I cannot imagine how that would look.

Jonas Smedegaard

unread,
Sep 12, 2023, 1:30:05 PM9/12/23
to
Quoting Russ Allbery (2023-09-12 18:15:27)
> Jonas Smedegaard <jo...@jones.dk> writes:
>
> > If you mean to say that ambiguous MIT declarations exist in
> > debian/copyright files written using the machine-readable format, then
> > please point to an example, as I cannot imagine how that would look.
>
> I can see it: people use License: Expat but then include some license that
> is essentially, but not precisely, the same as Expat. If we then tell
> people that they can omit the text of the license and we'll fill it in
> automatically, they'll remove the actual text and we'll fill it in with
> the wrong thing.
>
> This is just a bug in handling the debian/copyright file, though. If we
> take this approach, we'll need to be very explicit that you can only use
> whatever triggers the automatic inclusion of the license text if your
> license text is word-for-word identical. Otherwise, you'll need to cut
> and paste it into the file as always.

Ah, right. I see it now.

Strictly speaking it is not (as I was more narrowly focusing on) that
the current debian/copyright spec leaves room for *ambiguity*, but
instead that there is a real risk of making mistakes when replacing with
centrally defined ones (e.g. redefining a local "Expat" from locally
meaning "MIT-ish legalese as stated in this project" to falsely mean
"the MIT-ish legalese that SPDX labels MIT").

If you disagree, then please shout, as then I am still missing your
point here...
signature.asc

Russ Allbery

unread,
Sep 12, 2023, 2:00:03 PM9/12/23
to
Jonas Smedegaard <jo...@jones.dk> writes:

> Strictly speaking it is not (as I was more narrowly focusing on) that
> the current debian/copyright spec leaves room for *ambiguity*, but
> instead that there is a real risk of making mistakes when replacing with
> centrally defined ones (e.g. redefining a local "Expat" from locally
> meaning "MIT-ish legalese as stated in this project" to falsely mean
> "the MIT-ish legalese that SPDX labels MIT").

Right, the existing copyright format defines a few standard labels and
says that you should only use those labels when the license text matches,
but it doesn't stress that "matches" means absolutely word-for-word
identical. I suspect, although I haven't checked, that we've made at
least a few mistakes where some license text that's basically equivalent
to Expat is labelled as Expat even though the text is not word-for-word
identical. Given that currently all labels in debian/copyright are
essentially local and the full text is there (except for common-licenses,
where apart from BSD the licenses normally are used verbatim), this is not
currently really a bug. But we could turn it into a bug quite quickly if
we relied on the license short name to look up the text.

To take an example that I've been trying to get rid of for over a decade,
many of the /usr/share/common-licenses/BSD references currently in the
archive are incorrect. There are a few cases where the code is literally
copyrighted only by the Regents of the University of California and uses
exactly that license text, but this is not the case for a lot of them. It
looks like a few people have even tried to say "use common-licenses but
change the name in the license" rather than reproducing the license text,
which I don't believe meets the terms of the license (although it's of
course very unlikely that anyone would sue over it).

A quick code search turns up the following examples, all of which I
believe are wrong:

https://sources.debian.org/src/mrpt/1:2.10.0+ds-3/doc/man-pages/pod/simul-beacons.pod/?hl=35#L35
https://sources.debian.org/src/gridengine/8.1.9+dfsg-11/debian/scripts/init_cluster/?hl=7#L7
https://sources.debian.org/src/rust-hyphenation/0.7.1-1/debian/copyright/?hl=278#L278
https://sources.debian.org/src/nim/1.6.14-1/debian/copyright/?hl=64#L64
https://sources.debian.org/src/yade/2023.02a-2/debian/copyright/?hl=78#L78

An example of one that probably is okay, although ideally we still
wouldn't do this because there are other copyrights in the source:

https://sources.debian.org/src/lpr/1:2008.05.17.3+nmu1/debian/copyright/?hl=15#L15

This problem potentially would happen a lot with the BSD licenses, since
the copyright-format document points to SPDX and SPDX, since it only cares
about labeling legally-equivalent documents, allows the license text to
vary around things like the name of the person you're not supposed to say
endorsed your software while still receiving the same label.

We therefore cannot use solely SPDX as a way of determining whether we can
substitute the text of the license automatically for people, because there
are SPDX labels for a lot of licenses for which we'd need to copy and
paste the exact license text because it varies. At least if I understand
what our goals would be.

(License texts that have portions that vary between packages they apply to
are a menace and make everything much harder, and I really wish people
would stop using them, but of course the world of software development is
not going to listen to me.)

Bill Allombert

unread,
Sep 12, 2023, 3:00:04 PM9/12/23
to
On Tue, Sep 12, 2023 at 10:49:02AM -0700, Russ Allbery wrote:
> To take an example that I've been trying to get rid of for over a decade,
> many of the /usr/share/common-licenses/BSD references currently in the
> archive are incorrect. There are a few cases where the code is literally
> copyrighted only by the Regents of the University of California and uses
> exactly that license text, but this is not the case for a lot of them. It
> looks like a few people have even tried to say "use common-licenses but
> change the name in the license" rather than reproducing the license text,
> which I don't believe meets the terms of the license (although it's of
> course very unlikely that anyone would sue over it).

Note that my proposal makes detecting the discrepancy more visible rather
than less, since you can compare the generated copyright file with
the actual license statement without chasing files.

Also, overengineering aside, the copyright generator could support
parameter substitution to accomodate small discrepancies in license.
For example an option to replace in /usr/share/common-licenses/BSD the
line
"Copyright (c) The Regents of the University of California."
by whatever is required when generating DEBIAN/copyright.

Cheers,
Bill
0 new messages