Design of Experiment and Sobol indices

93 views
Skip to first unread message

roy.pa...@gmail.com

unread,
Oct 21, 2017, 5:42:03 AM10/21/17
to pystatsmodels
Hi everyone,

Are you interested in implementing some design of experiments strategies?

Also for statistical analysis I saw that you had ANOVA, are you interested in Sobol' indices?

Cheers,

Pamphile

josef...@gmail.com

unread,
Oct 21, 2017, 11:28:33 AM10/21/17
to pystatsmodels
Both are in scope of statsmodels.

Can you be more explicit about what you have in mind? or a reference?

DOE is on my general wishlist, but I have no background in it and there are some python packages.

I never heard of Sobol indices. A brief search and the Wikipedia page https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis sound more like functional ANOVA or functional decomposition.
That sounds also very interesting.

A google search finds this package https://pypi.python.org/pypi/SALib  (MIT licensed)

Josef

 

Cheers,

Pamphile

roy.pa...@gmail.com

unread,
Oct 21, 2017, 3:53:29 PM10/21/17
to pystatsmodels
For now, I am using the package openTURNS (LGPL) which implements all this.
It is being actively developed (I know the devs), it is C++ code with a python ui (swig).

SALib is good I guess but IMO is limited in terms of available DoE.

Long story short, Sobol' indices are used to assess the importance of each variable in terms of variance.
Ex:  f(x1, x2). If the indices are Sx1 = 0.8, Sx2 = 0.2, it means that 80% of the output variance of f(x1, x2)
is due to x1 whereas x2 is only responsible for 20% of it.

Idea here was to either:

- Do nothing, the user who wants this can go with the available library,
- Wrap one of these,
- Code something into statsmodels as openTURNS for instance is overkill just for that.

josef...@gmail.com

unread,
Oct 22, 2017, 3:46:43 PM10/22/17
to pystatsmodels
On Sat, Oct 21, 2017 at 3:53 PM, <roy.pa...@gmail.com> wrote:
For now, I am using the package openTURNS (LGPL) which implements all this.
It is being actively developed (I know the devs), it is C++ code with a python ui (swig).

SALib is good I guess but IMO is limited in terms of available DoE.

Long story short, Sobol' indices are used to assess the importance of each variable in terms of variance.
Ex:  f(x1, x2). If the indices are Sx1 = 0.8, Sx2 = 0.2, it means that 80% of the output variance of f(x1, x2)
is due to x1 whereas x2 is only responsible for 20% of it.

Idea here was to either:

- Do nothing, the user who wants this can go with the available library,
- Wrap one of these,
- Code something into statsmodels as openTURNS for instance is overkill just for that.

My impression based on a few hours of reading documentation and a bit of source code of SALib:

I don't think we want to wrap either of them, but we could include something like SALib in statsmodels.
That leaves options 1 and 3.

openTURNS is a large package, 12 years of development by three large French companies. Nevertheless it looks pretty modern and relatively easy to use. It's a large self-contained package, and because of the C++/Boost background and the license, doesn't match up well with our numpy/scipy/pandas integration, e.g. there are large overlaps with scipy, but it has things that neither scipy nor statsmodels has, e.g. some of the distributions and copulas besides the sensitivity parts.

Code something into statsmodels sounds good to me. 
Sensitivity analysis as in those packages would be one application, but it also would be a start to go into two direction, one is DOE where we don't have anything yet (we have a Halton sequence generator in two PRs for numerical integration of random effects), the other one is functional analysis with Functional ANOVA as a more general target starting with Sobol indices.

disclaimer: As far as I currently understand.

Josef

Pamphile Roy

unread,
Oct 25, 2017, 2:53:34 PM10/25/17
to pystat...@googlegroups.com
I will have a look at these two PRs. I do not have any code for that but could do (optimized LHS for example as this is the way to go now.).
However, I already have the computation of the discrepancy with good vectorization (metric for assessing the quality of the DoE).

As for Sobol’ indices, I coded the Saltelli2010 one’s for 1st and Total indices. Still have to figure out how to do the second order indices.


Pamphile

(@Tupui)

josef...@gmail.com

unread,
Oct 25, 2017, 3:14:38 PM10/25/17
to pystatsmodels
On Wed, Oct 25, 2017 at 2:53 PM, Pamphile Roy <roy.pa...@gmail.com> wrote:
I will have a look at these two PRs. I do not have any code for that but could do (optimized LHS for example as this is the way to go now.).

The halton sequence function is here

(I don't find the more recent PR, most likely I never pushed the code to a github branch or PR, but it uses just a copy of that function.)

roy.pa...@gmail.com

unread,
Nov 4, 2017, 11:37:09 AM11/4/17
to pystatsmodels
Hi,

Based on what I commented on the PR you linked, is this good for a PR?

Pamphile

josef...@gmail.com

unread,
Nov 4, 2017, 12:33:27 PM11/4/17
to pystatsmodels
On Sat, Nov 4, 2017 at 11:37 AM, <roy.pa...@gmail.com> wrote:
Hi,

Based on what I commented on the PR you linked, is this good for a PR?

In general yes. I tried to read a bit, but I don't have much background in this.

You could add a new issue to keep the discussion in one place.
I don't know yet where we should put it. The current halton sequence is in discrete 
because that's where we had the application but that's not a general location for it.
maybe in statsmodels.tools

My guess is that we need several versions of it.
e.g. I was skimming parts of
Computational Investigations of Low-Discrepancy Sequences
LADISLAV KOCIS and WILLIAM J. WHITEN
which means we would have to be more careful with large dimensional application.
For the random effects application it will be only low dimensional in all cases I have seen.

Another thought was whether it is possible to yield new multivariate observations
if we don't know in advance how many we need. This is however not possible in
either current implementation, as far as I have seen.

Josef

Pamphile Roy

unread,
Nov 4, 2017, 1:42:11 PM11/4/17
to pystat...@googlegroups.com
OK then I will open an issue with some recap from here.

Indeed, there the basic implementation of Halton is only suitable when dim ~< 10.
Otherwise it is better to go with Sobol sequence for instance. As a matter of fact, from :

Damblin et al., Numerical studies of space filling designs : optimization of Latin Hypercube Samples and subprojection properties,
Journal of Simulation, 2013.

the Sobol sequence perform well even in quite high dimension (there are also some attempt from Sobol himself with a company called Broda
to go with huge number of dimension, but the free version of the code only goes to dim ~ 50).
But if you need even higher dimensions, you will need to go with optimized LHS which is state of the art.

Pamphile

ps. thanks for the ref, I didn’t knew about it :)

Pamphile Roy

unread,
Nov 4, 2017, 1:42:21 PM11/4/17
to pystat...@googlegroups.com
OK then I will open an issue with some recap from here.

Indeed, there the basic implementation of Halton is only suitable when dim ~< 10.
Otherwise it is better to go with Sobol sequence for instance. As a matter of fact, from :

Damblin et al., Numerical studies of space filling designs : optimization of Latin Hypercube Samples and subprojection properties,
Journal of Simulation, 2013.

the Sobol sequence perform well even in quite high dimension (there are also some attempt from Sobol himself with a company called Broda
to go with huge number of dimension, but the free version of the code only goes to dim ~ 50).
But if you need even higher dimensions, you will need to go with optimized LHS which is state of the art.

Pamphile

ps. thanks for the ref, I didn’t knew about it :)


roy.pa...@gmail.com

unread,
Nov 4, 2017, 2:55:13 PM11/4/17
to pystatsmodels
Just filled an issue. Feel free to modify it.

https://github.com/statsmodels/statsmodels/issues/4103

Pamphile Roy

unread,
Jun 20, 2020, 4:39:46 PM6/20/20
to pystatsmodels
Hi,

Long time no see :)

We've somehow settled on the topic of DoE (and I have a hope for the merge in Scipy to happen soon as we're on it with @balandat),
but there is still the question about sensitivity analysis. Is it still of interest here?


  • Sobol' (first order and total).
  • COSI indices.
  • CUSUNORO.
  • Moment independent measures.
  • Some visualization functions.

Let me know if there is interest for a PR and if yes, what shall I include :)

Pamphile
@tupui


Le dimanche 22 octobre 2017 09:46:43 UTC-10, josefpktd a écrit :
Reply all
Reply to author
Forward
0 new messages