Bootstrap confidence intervals

Kiko

unread,

Feb 18, 2015, 9:57:27 AM2/18/15

to pystat...@googlegroups.com

Hi all,

I'm trying to fit some data to a distribution (genextreme) using scipy.stats and then I want to obtain confidence intervals for the estimators using bootstrap (scikits-bootstrap).

Using R i get 'coherent' values of the estimators. Using scipy sometimes I get strange values for the shape parameter. R is more or less fast to obtain bootstrap confidence intervals and scipy is quite slow due to mle estimation.

I have two problems,

-the first one is that the default mle fit method provided with continuous distributions is slow (is slow if I want to run it 500 or 5000 times to obtain bootstrapping information).

-the second one is that if I want to use other optimizer (a faster optimizer like scipy.optimize.fmin_bfgs or fmin_cg instead of fmin) I need to provide a first guess and this affects the bootstrap confidence intervals. I'm using genextreme and the shape parameter is always 1 if I don't use a first guess value for the shape.

So, my questions are,

Is it a better way to achieve this using statsmodels?

Are there other methods to obtain the estimators of the parameters using statsmodels (Generalised Mehotd of Moments,...)?

Thanks

P.D.: I'm not statistician so sorry if I asked something 'strange'.

Kind regards.

josef...@gmail.com

unread,

Feb 18, 2015, 11:23:53 AM2/18/15

to pystatsmodels

It's a perfectly good question, but we don't have anything ready made
specifically for the generalized extreme value distribution, or
similar extreme value analysis.

If your initial estimate for your data is good, then I would use it as
starting value for the optimization in the bootstrap samples. This
would start the optimization in the correct neighborhood and fmin_bfgs
should work, in many cases fmin_l_bfgs_b is more stable even without
bounds.

The longer answer:

I haven't looked at this in several years.
(I stopped working on distributions, when, for a while, I didn't feel
like giving away all my code.)

so, if I remember correctly:

estimating the parameters for the GEV by maximum likelihood can be
tricky. Many alternative estimators have been developed because MLE
had a bad reputation for this case. However, it's not so bad but needs
good starting values, especially if all parameters are estimated and
loc is not fixed.

IIRC, I got incorrect convergence if the sign of the starting values
was wrong, but relatively good convergence if the sign was right.

based on a quick look at the documentation of two R packages: `evd`
starts with moment estimators in reparameterized model while
`extRemes` seems to start with L-moments.

Scipy has basin-hopping as a global optimizer which could work quite
well, but I never tried for this case.

(I have maximum spacing, GMM based on quantiles or cdf, and minimum
distance based on characteristic functions in my trial code, parts of
it are in the sandbox, but those are not in an easily digestible,
clean form, and I don't think they will work better than MLE with
reasonable starting values.)

Josef

>
> Kind regards.

Kiko

unread,

Feb 19, 2015, 2:43:51 AM2/19/15

to pystat...@googlegroups.com

That's a pretty good idea :-)

The longer answer:

I haven't looked at this in several years.
(I stopped working on distributions, when, for a while, I didn't feel
like giving away all my code.)

so, if I remember correctly:

estimating the parameters for the GEV by maximum likelihood can be
tricky. Many alternative estimators have been developed because MLE
had a bad reputation for this case. However, it's not so bad but needs
good starting values, especially if all parameters are estimated and
loc is not fixed.

IIRC, I got incorrect convergence if the sign of the starting values
was wrong, but relatively good convergence if the sign was right.

based on a quick look at the documentation of two R packages: `evd`
starts with moment estimators in reparameterized model while
`extRemes` seems to start with L-moments.

(Thanks for taking the time, I also looked the code of extRemes but I feel not confident with my R understanding yet)

I have implemented lmoments so I could estimate the GEV estimators using lmoments (it is quite fast) and use this values as a starting point for the mle fit and then the mle fit (estimators) obtained can be used to obtain the c, loc, scale and return values confidence intervals via bootstrapping!!! You saved my [full of doubts] day

Scipy has basin-hopping as a global optimizer which could work quite
well, but I never tried for this case.

(I have maximum spacing, GMM based on quantiles or cdf, and minimum
distance based on characteristic functions in my trial code, parts of
it are in the sandbox, but those are not in an easily digestible,
clean form, and I don't think they will work better than MLE with
reasonable starting values.)

Thanks for the info. I'm reading the mail list and following the development and I'm learning a lot.