--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
One thing I think you need to consider though: Theory says that the tails of most distribution converge to a Generalized Pareto distribution as one moves along their tails toward infinity. Of course our data do not extend to infinity, so we have to stop moving to the right before we run out of data. This then raises the question of whether the GP well approximates the chunk of the tail we examine. So one issue in GP fitting is the choice of the threshold defining the tail. Too high and you have too little data; too low and you are too far from the theoretical GP distribution.
So at a minimum it would be interesting to know what happens if you use that threshold, i.e., only fit to the data above 150. Certainly that choice is debatable, and I did not systematically explore alternatives. It appears that you used 100, right?
Question: a subset of GP distributions have absolute maxima (those with \xi<0). So the posterior distribution could assign non-vanishing probability to the hypothesis that a Carrington-sized event is impossible. And yet it appears not to. Does the prior used above exclude the possibility of a finite maximum on geomagnetic storm size?
No, I don’t see any strong advantage of the frequentist ML fit over the Bayesian approach, though my understanding of the latter is incomplete. My prior, as it were, is that the two should give similar answers if given similar inputs. Possibly Aki’s next answer will prove me wrong on that.
I think there’s a difference between effectively imposing P(Quebec)>0 and P(Carrington)>0 because the first bit of information comes embedded in a complete data series for several decades, chosen with minimal arbitrariness, whereas the latter is selected precisely for its extremity and so isn’t representative of the larger data range it is part of.
(In the bootstrapping method I use, it is of course possible for P(Quebec)=0 in some replications.)
--David
--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/C6IxqRU1Cuw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
Thank you, Aki.
I did not intend to imply that I thought we ought to require Carrington/850 to be possible. I just asked about that to try to understand better what you did. I do think however that this is pretty much what Colin was looking for.
In my bootstrapping, the only constraint is sigma>0. Except in the “filtered” regressions in the graph in my last message, I require 850 to be possible.
I tend to think now that just introducing this requirement introduces a kind of bias. Maybe there is a name for this in Bayesian analysis, right? If I am interested in inference about a parameter k and I introduce the prior k > kmin based on the historical record, but choose not to impose a max based on the historical record, then this in itself tends to raise my central estimate of k, right? Is there a name for this phenomenon?
--David
From: stan-...@googlegroups.com [mailto:stan-...@googlegroups.com] On Behalf Of Aki Vehtari
Sent: Thursday, November 12, 2015 3:00 PM
To: Stan users mailing list <stan-...@googlegroups.com>
Subject: [stan-users] Re: applying Bayesian analysis to Extreme Value Theory (EVT)
The parameters had the following constraints
--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
Now looking at the predictive check it's not hard to see an elbow that the generalized Pareto is too rigid to fit
> My guess is that this kind of elbow is due to changes in the overall activation level in time and that should be modeled for the best accuracy (instead of using a more complex univariate distribution).
Sure, but then you’re just creating a generative process
which is implicitly creating a more complex univariate
distribution when you marginalize the nuisance parameters
out.
> Discrepancy is not bad, it fits inside the uncertainty intervals.
Except that you’re ignoring correlations — the predictive
distributions are largely random around the median but
the data are systematically correlated away.
Hi Colin,
I think that summary is correct, but I would add that the simulations I did showed that if the last ~200 years are statistically identical to the observed 1957-2014 Dst history aside from Carrington, then estimates that factor in the Carrington occurrence and nothing else from pre-1957 will be upward-biased. So somehow I think we need to build in some prior about the pre-1957 Dst history.
Something that might help is the aa index (https://www.ngdc.noaa.gov/stp/geomag/aastar.html), which I believe is based on two antipodal, non-equatorial observatories and goes back to 1868. Maybe it can be used to estimate pre-1957 Dst values.
Aki, I’m not sure if this speaks to what you were thinking about exploring time dependency, but two cycles I would include are the semi-annual cycle, with peak Dst activity at the equinoxes, and the sunspot cycle, which I treat as taking 11 years and reaching a low on 1 Jan 2008.
--David
From: stan-...@googlegroups.com [mailto:stan-...@googlegroups.com] On Behalf Of Colin Rust
Sent: Sunday, November 22, 2015 10:19 AM
To: Stan users mailing list <stan-...@googlegroups.com>
Subject: [stan-users] Re: applying Bayesian analysis to Extreme Value Theory (EVT)
Thanks Aki, David, Michael, this has been a really interesting discussion.
A key issue for policy is the probability of a really extreme storm, like the Carrington event. Under the ML distribution, David found a probability of ~0.005% per solar storm, which works out to 0.33% per decade or 6.4% probability of at least one over a 200 year period. As David noted in his original post, those numbers seem low. Indeed, eyeballing Aki's charts, we see numbers about a factor of 20x larger on a Bayesian analysis: a probability of >0.1% per storm which works out to ~6% per decade or a probability of ~80% (computed as 1-(1-6%)^20 = 81%) over a 200 year period.
That's a sufficiently large difference that I would describe it as qualitative: Under the ML distribution, one could describe Carrington as a freakishly extreme event, that humanity was unlucky to have encountered in the period since the Industrial Revolution and thus should have limited weight in policy planning. Under a Bayesian analysis that incorporates contributions from the non-ML distributions, we instead see Carrington as a magnitude of storm that might plausibly recur in the coming decades and so is very important for planning.
I think that summary is correct, but I would add that the simulations I did showed that if the last ~200 years are statistically identical to the observed 1957-2014 Dst history aside from Carrington, then estimates that factor in the Carrington occurrence and nothing else from pre-1957 will be upward-biased. So somehow I think we need to build in some prior about the pre-1957 Dst history.
Something that might help is the aa index (https://www.ngdc.noaa.gov/stp/geomag/aastar.html), which I believe is based on two antipodal, non-equatorial observatories and goes back to 1868. Maybe it can be used to estimate pre-1957 Dst values.
Aki, I’m not sure if this speaks to what you were thinking about exploring time dependency, but two cycles I would include are the semi-annual cycle, with peak Dst activity at the equinoxes, and the sunspot cycle, which I treat as taking 11 years and reaching a low on 1 Jan 2008.
Thanks Aki, David, Michael, this has been a really interesting discussion.
A key issue for policy is the probability of a really extreme storm, like the Carrington event. Under the ML distribution, David found a probability of ~0.005% per solar storm, which works out to 0.33% per decade or 6.4% probability of at least one over a 200 year period. As David noted in his original post, those numbers seem low. Indeed, eyeballing Aki's charts, we see numbers about a factor of 20x larger on a Bayesian analysis: a probability of >0.1% per storm which works out to ~6% per decade or a probability of ~80% (computed as 1-(1-6%)^20 = 81%) over a 200 year period.
I think that summary is correct, but I would add that the simulations I did showed that if the last ~200 years are statistically identical to the observed 1957-2014 Dst history aside from Carrington, then estimates that factor in the Carrington occurrence and nothing else from pre-1957 will be upward-biased. So somehow I think we need to build in some prior about the pre-1957 Dst history.
Aki, I downloaded the aa index data from ftp://ftp.ngdc.noaa.gov/STP/GEOMAGNETIC_DATA/AASTAR/aaindex. A version that may be easier to work with is at http://1drv.ms/1T8Xfqi.
--David
From: stan-...@googlegroups.com [mailto:stan-...@googlegroups.com] On Behalf Of Colin Rust
Sent: Sunday, November 22, 2015 5:56 PM
To: Stan users mailing list <stan-...@googlegroups.com>
--