Would OpenTURNS LGPL licence be an issue to be used by statsmodels?

34 views
Skip to first unread message

Michael Baudin

unread,
Feb 6, 2021, 9:30:11 AM2/6/21
to pystatsmodels

Hi,

I develop the OpenTURNS library for uncertainty quantification. We recently had a discussion (at https://github.com/statsmodels/statsmodels/pull/7254) about using OpenTURNS features to provide copulas algorithms in statsmodels.

It appears that the LGPL licence of OpenTURNS might an issue. I do not understand this particular point: can't a MIT software use a LGPL one?

If OpenTURNS was MIT, what would the features that might be useful for statsmodels? I think that the probabilistic modeling (distributions, copula, nonparametric methods, etc...) would be useful, and higher level algorithms as well (HDR, parameter estimation, etc...). Is this correct?

Best regards,

Michaël

Michael Baudin

unread,
Feb 6, 2021, 10:07:07 AM2/6/21
to pystatsmodels
Sorry: this is a duplicate (I did not see the message...)

josef...@gmail.com

unread,
Feb 6, 2021, 10:17:30 AM2/6/21
to pystatsmodels
Hi Michael,

first time posters to the mailing list are moderated. Future messages
go through without moderation

On Sat, Feb 6, 2021 at 9:30 AM Michael Baudin <michael...@gmail.com> wrote:
>
>
> Hi,
>
> I develop the OpenTURNS library for uncertainty quantification. We recently had a discussion (at https://github.com/statsmodels/statsmodels/pull/7254) about using OpenTURNS features to provide copulas algorithms in statsmodels.
>
> It appears that the LGPL licence of OpenTURNS might an issue. I do not understand this particular point: can't a MIT software use a LGPL one?

LGPL allows us to use it as a library, but not to read the code, and
not to copy parts of it.

It is a major advantage of the MIT/BSD-3 dominance in this area of
Python packages, that we can read and copy each others code.
Two examples
Patsy has splines that can be used in formulas, but does not provide
the extra functionality for penalized splines, and the information in
the formula is to vague to work with it directly.
When we added GAM, penalized splines for GLM, we copied part of
patsy's spline code and addjusted and added to what we needed for
penalization.
It's still convenient for users to use patsy's spline in the formulas,
but for GAM we needed our own.

scipy and numpy.random are providing most of the distribution
functionality for us.
However, for models we need more, e.g. derivatives, or a different
parameterization. So for the core part of models, we have our own
version of Poisson, Binomial and similar, but delegate to scipy for
the rest, cdf, ppf, ....

On the other hand, numpy and pandas are used by statsmodels as
libraries because there is less direct overlap in functionality.
Still, it's sometimes easier to look at the code in how something is
implemented, and it makes life easier for developers that work on
several packages if those are license compatible.


>
> If OpenTURNS was MIT, what would the features that might be useful for statsmodels? I think that the probabilistic modeling (distributions, copula, nonparametric methods, etc...) would be useful, and higher level algorithms as well (HDR, parameter estimation, etc...). Is this correct?

I try to answer later, need to go offline soon

Josef

>
> Best regards,
>
> Michaël
>
> --
> You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/1441c2bb-363f-4bb9-ab89-97ae66b8eed9n%40googlegroups.com.

josef...@gmail.com

unread,
Feb 6, 2021, 2:54:19 PM2/6/21
to pystatsmodels
difficult to answer

quick answer: If OpenTURNS were MIT/BSD-3 compatible, then I would copy some of the formulas from your copula files. Your C++ math is readable.

I never looked much at OpenTURNS. I browsed now your code a bit and read the first part of your Handbook of Uncertainty Quantification chapter.

OpenTURNS is focused on simulation methods and is closer to Bayesian Monte Carlo methods than to what we have mostly been doing.
Most of statsmodels is targeted to estimating parametric models and associated post-estimation, prediction, forecasting and inference.

The overlap is in the basic methods that can be reused in different ways. More overlap will be if or when statsmodels adds more simulation based methods.

Python code versus library functions (especially in C/C++/Fortran):
Statsmodels is a heavy user of linear algebra and special function wrapped in scipy and numpy. Those are clearly targeted to generic use cases and we copy code only in very limited cases (e.g. when we need an extra return that is not returned by the numpy or scipy function.)
On the other hand scipy.interpolate is largely useless to us. For example spline implementation are targeted for efficient evaluation of the splines, but don't provide the needed parts for estimation. So, Patsy's and our splines were written independently of scipy.interpolate.

I have not looked closely enough at OpenTURNS to tell how much it provides the basics for future applications that we will likely have.
My main target following our pattern would be maximum likelihood estimation of multivariate models (something closer to gamlss or vgam in R)

Statsmodels is missing simulation based models.
So far we concentrated on models and features that are analytically tractable, or that have an analytical approximation. Simulations are used in a few parts but not systematically.
This means that we are still missing features where an analytically tractable or simple numerical approximation doesn't exist.
e.g. A long time ago we had a Google summer of code project for mixed, random effects discrete choice models that used halton sequence for integration. That project was never finished and never merged.

my background:
I worked a lot on distributions up to 8 years ago. Since then I haven't done much in this area.
Recently, I got back into distributions as part of goodness-of-fit and specification testing for our models with emphasis mainly on flexible 3 or 4 parameter distributions.
Resurrecting copulas came about because Pamphile asked on scipy issue about it.

My work in statsmodels for the last few years has been more towards estimation when we don’t have a correctly specified distribution and likelihood, and we still want to estimate parameters and moment functions and do inference for them that is robust to distributional misspecification.

Josef
Chrome on my computer froze, finished on iPad :(



>
> I try to answer later, need to go offline soon
>
> Josef
>
> >
> > Best regards,
> >
> > Michaël
> >
> > --
> > You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodels+unsubscribe@googlegroups.com.

Michael Baudin

unread,
Feb 6, 2021, 4:23:41 PM2/6/21
to pystatsmodels
Thank you for your excellent and thorough answer.

I admit that I did not imagine that copying the code would be an option. My option would be to make the work for a dependency as simple as possible. We have already made deep changes in the library to make other projects possible, e.g. PERSALYS (https://persalys.fr). We have, for example, developed several specific algorithms for PERSALYS that came into the core OpenTURNS engine afterwards to make the maintenance easier and improve quality. If, however, copying and modifying the code to do the integration is the most often use solution, OpeNTURNS cannot be an option indeed. However, this sounds surprising to me.

For example, OpenTURNS has a rather interesting set of optimization features, most of which are based on external dependencies : NLOpt, Dlib, Ceres, bonmin, Cobyla, etc... We would not been able to re-develop these features without re-using the interface of these codes. However, plugin these libraries into OT let us have a full set of options for parameter estimation : maximum likelihood, method of moments, etc... We can, furthermore, estimate some parameters while others are known, add constraints if required, etc... Nonlinear least squares is a building block for several algorithms including bayesian calibration in OT.

When you say that "OpenTURNS is focused on simulation methods", that might be the feeling that we may get by reading some of our papers indeed. Notice however that many features are based on exact calculations when possible, and quadrature otherwise. For example, the library has an interesting features which provides distribution arithmetic.

import openturns as ot
N = ot.Normal()
U = ot.Uniform()
P = N + U
P.drawPDF()

This is combined with several tricks, e.g. the sum of two gaussians is a gaussian, etc... There is no simulation involved in this, only quadrature. More examples are provided here :


If you are interested in goodness-of-fit, you might be interested by this:


Regards,

Michaël


> > To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.

roy.pa...@gmail.com

unread,
Feb 8, 2021, 7:58:20 AM2/8/21
to pystatsmodels
Hi,

I believe SciPy has some interesting information about this. See here: https://scipy.github.io/devdocs/hacking.html#license-considerations

The first link is referring to this document from John Hunter http://nipy.sourceforge.net/nipy/stable/faq/johns_bsd_pitch.html.
At the end he explicitly talks about LGPL and raise the example of ipython which was LGPL and re-released under BSD.

Also this answer from stackexchange is interesting: https://softwareengineering.stackexchange.com/a/345318.
The obligation to allow the user to substitute the library with something compatible can be difficult to achieve in practice.

Cheers,

Pamphile 

josef...@gmail.com

unread,
Feb 8, 2021, 8:30:52 AM2/8/21
to pystatsmodels
On Mon, Feb 8, 2021 at 7:58 AM roy.pa...@gmail.com <roy.pa...@gmail.com> wrote:
Hi,

I believe SciPy has some interesting information about this. See here: https://scipy.github.io/devdocs/hacking.html#license-considerations

The first link is referring to this document from John Hunter http://nipy.sourceforge.net/nipy/stable/faq/johns_bsd_pitch.html.
At the end he explicitly talks about LGPL and raise the example of ipython which was LGPL and re-released under BSD.

Also this answer from stackexchange is interesting: https://softwareengineering.stackexchange.com/a/345318.
The obligation to allow the user to substitute the library with something compatible can be difficult to achieve in practice.

I think this would not affect us, because statsmodels is open source with a license where users can replace, substitute any parts.
We are not required to do it for them.

GPL has a stronger requirement, where we need to provide code that makes the GPL part non-essential or directly substitutable, or something like that.
For example, cvxopt just provides an additional optimizer, it's just an option to use it as an alternative to scipy.optimize.
We also only provide an interface to use cvxopt, not include anything of cvxopt which makes our usage of it even weaker.

Josef

 

josef...@gmail.com

unread,
Feb 8, 2021, 8:47:13 AM2/8/21
to pystatsmodels
On Mon, Feb 8, 2021 at 8:30 AM <josef...@gmail.com> wrote:


On Mon, Feb 8, 2021 at 7:58 AM roy.pa...@gmail.com <roy.pa...@gmail.com> wrote:
Hi,

I believe SciPy has some interesting information about this. See here: https://scipy.github.io/devdocs/hacking.html#license-considerations

The first link is referring to this document from John Hunter http://nipy.sourceforge.net/nipy/stable/faq/johns_bsd_pitch.html.
At the end he explicitly talks about LGPL and raise the example of ipython which was LGPL and re-released under BSD.

Also this answer from stackexchange is interesting: https://softwareengineering.stackexchange.com/a/345318.
The obligation to allow the user to substitute the library with something compatible can be difficult to achieve in practice.

I think this would not affect us, because statsmodels is open source with a license where users can replace, substitute any parts.
We are not required to do it for them.

It could be a problem for downstream bundling. 
I never looked at that because it is currently not relevant, all of our code and required dependencies are BSD-3 compatible.

Josef
Reply all
Reply to author
Forward
0 new messages