Hi Tom, thanks for taking a look and being so positive :)
I'm friendly with the authors of both the dependencies, so I am
confident I can work out good terms to make sure everything is
perfectly above-board copyright wise. But you're right, for that
reason I was lax about thinking about it much beforehand.
As I read it the canonical-greekLit license statement is "everything
is CC-BY-SA unless stated otherwise in individual files", and
grepping around I can't find any exceptions. The "personal use" part
of the readme doesn't use the phrase "personal use only", so I think
it's more to be taken as a non-normative statement about intention
of the bare minimum they intend to offer, rather than some exception
to the license. Is that how you read it too? If you think it's still
ambiguous I can contact the people in charge and ask if it can be
made clearer - I am confident their intentions are that it is freely
useable for exactly this kind of application.
As for rigaudon, you're right the license (GPLv2) is more
problematic to use in the Apache2.0 licensed training, and moreover
the provenance of the dictionaries I'm getting from there is not at
all clear. I'll write to Bruce (who maintains rigaudon) to ask about
it. And yes, it is true that I am just incorporating the wordlist
from there quite directly, not extracting it like in the case of the
canonical-greekLit repository.
Word lists strike me as a good example of something that copyright
shouldn't (and depending on jurisdiction generally probably doesn't)
cover, as they are very much an aggregation of 'mere facts'. But
sadly I suppose it is sensible to work on the assumption that they
are copyrightable just in case, so we are safe.
> I think it's great that you've got the process pegged to specific commits in
> the dependencies so the results are reproduceable, but it'd be nice to get
> those commit IDs and repo URLs emitted somewhere (grc.config?) in the final
> data.
I think it's great too :) The commit ID of the grctraining
repository is included in the grc.config already, and from that it's
easy to find the commits of the dependencies that were used (as
they're pegged in the makefile, as you saw). I suppose including the
grctraining repo URL in the grc.config comment makes sense, good
idea. Do you think including the commits / URLs used for the
dependencies is useful there? I am inclined to think it is not, but
feel free to persuade me otherwise ;)
Thanks again for taking a look and offering such useful thoughts.
Nick
> CAE9vqEEgq25yA%3DO_o79GBdep1uU1gDdtYaBWSevZ%2Bkkbj2UgYQ%
40mail.gmail.com.