The nltk-2.0b7.zip and nltk-2.0b7.tar.gz files currently available for
download include a "yaml" directory, containing code from the PyYaml
project (http://pyyaml.org/). PyYaml is an open-source project under
the MIT license (http://pyyaml.org/browser/pyyaml/trunk/LICENSE), so
including part or all of its code in another project is perfectly
fine, except for one problem. If you read the MIT license, it requires
that "[t]he above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software."
Currently, the NLTK code does not include any LICENSE.txt file in the
yaml directory, nor does it acknowledge where the yaml code came from.
This means that it's not in compliance with the MIT license terms, and
if I were to create a .deb package of NLTK for inclusion in Ubuntu (or
Debian) as it currently stands, the package would be rejected on
legal-compliance grounds.
Fortunately, the solution is easy. Simply include a LICENSE.txt file
in the yaml directory (copying the
http://pyyaml.org/browser/pyyaml/trunk/LICENSE file verbatim would
probably be enough) and NLTK's code will again be in compliance with
PyYaml's license. No changes to the code will be needed. I'd still
recommend increasing the version number to 2.0b8, though: the
difference between compliance and non-compliance with a license is an
important enough change that it should be marked by a new version
number.
--
Robin Munn
Robin...@gmail.com
GPG key 0x4543D577
* nltk/model/api.py has no copyright or license listed, unlike most
other files in the NLTK source. A comment should probably be added
stating "Copyright (c) (dates) NLTK Project" and "For license
information, see LICENSE.TXT"
* nltk/book.py also has no copyright or license listed; same fix is recommended.
The following files have a reference to LICENSE.TXT, but no clear
copyright statement claiming copyright by the NLTK project. This would
not prevent their inclusion in Ubuntu or Debian on legal grounds, but
it's inconsistent with the rest of the NLTK source files and is
probably an easily-rectifiable omission:
nltk/inference/tableau.py - no copyright listed in file
nltk/parse/malt.py - no copyright listed
nltk/parse/util.py - no copyright listed
nltk/sem/chat80.py - no copyright listed
nltk/sem/drt_glue_demo.py - no copyright listed
nltk/sem/drt.py - no copyright listed
nltk/sem/drt_resolve_anaphora.py - no copyright listed
nltk/sem/glue.py - no copyright listed
nltk/sem/hole.py - no copyright listed
nltk/sem/lfg.py - no copyright listed
nltk/sem/linearlogic.py - no copyright listed
nltk/sem/logic.py - no copyright listed
nltk/sem/util.py - no copyright listed
Finally, none of the files in the nltk/test directory contain any
copyright or licensing statements. As they're doctest files, which are
a mixture of documentation and test code, it's unclear whether the
Apache license or the Creative Commons license would apply to them. A
simple comment at the top of each file clarifying its licensing terms
would be enough to fix this one.
None of these problems are technical problems that would prevent my
working on an Ubuntu NLTK package. However, some of these problems
(especially the omission of the MIT license from the yaml directory)
would prevent my package being included in Ubuntu once it's completed,
on legal grounds. The deadline for getting new packages into Ubuntu to
be released with the next version ("Lucid Lynx", to be released in
April 2010) is February 18th (see
https://wiki.ubuntu.com/LucidReleaseSchedule for the details of the
schedule). I would like to get an NLTK package into Lucid, since it's
going to be the next Long-Term Support release of Ubuntu and therefore
will be on many people's desks for quite some time -- but that means
the licensing fixes I mentioned in this email and my previous one need
to be done soon, preferably within the next week so I have plenty of
time to go through Ubuntu's new-package review process.
So the sooner these issues are fixed (and all of them look like pretty
simple fixes), and a new 2.0b8 NLTK release is made, the better the
chances of my getting an NLTK package into Ubuntu Lucid this April.
Good to hear, thanks.
You're right that yaml doesn't really belong in NLTK packages if the
distribution has good dependency support, as Debian and Ubuntu both
do. My intention is indeed to remove it from the package and just list
a dependency on python-yaml -- but since it'll still be in the
original tarball, which the Debian/Ubuntu package system distributes
as "projectname_version.orig.tar.gz" in the source packages,
redistribution rights and licensing terms still apply even if the code
was removed from the binary package, hence my comment about needing to
include a copy of the MIT license.
An update on this issue: It's not possible to state "Both licenses
apply to the file," because the Apache license allows derivative
works, and the NLTK documentation (as per README.txt) is distributed
under a Creative Commons "No Derivative Works" license, so the
licenses are mutually exclusive and can't both apply.
Personally, I would consider doctests (especially if they're part of
the library's unit test framework) to be code rather than
documentation, since their primary purpose is to be executed as part
of a test framework. However, a lawyer *could* argue that they're
documentation. Since the Debian Free Software Guidelines were written
in part to make sure everything is nailed down legally and they're no
possibility of lawsuits, the nltk/test/* files, as they currently
stand, would be problematic for getting the package into Ubuntu and/or
Debian.
When a new (2.0b8) release of NLTK is made, the nltk/test/* files need
to have a copyright claim and license statement listed in them, and
the license needs to be Apache-2.0 (and NOT CC-by-nc-nd).
I'm not certain that a clean install will work on other systems, so
would others please do a test and report back here please (you might
need to remove multiple instances of NLTK from your machine first,
before doing an installation from the head revision in the
repository).
Thanks,
-Steven
Those changes look great. You missed a couple files, though; as of
revision 8483, the following files still have issues:
* nltk/examples/pt.py (no copyright or license information listed in file)
* nltk/inference/prover9.py (license information but no explicit NLTK
copyright statement)
* nltk/sem/drt_resolve_anaphora.py (ditto: license information but no
explicit NLTK copyright statement)
Apart from those three files, this revision resolves all the potential
legal issues that would prevent the NLTK package getting adopted into
Ubuntu and Debian. Thanks!