Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

chi2gof function issues

700 views
Skip to first unread message

Bryce Cruey

unread,
Oct 13, 2010, 4:34:03 PM10/13/10
to
Hello -

I am new to matlab statistics toolbox and am trying to figure out how to use the chi2gof() function. I have a dataset that I have fit a generalized extreme value distribution to (GEV). I want to test the goodness of fit by generating the test statistic for the chi-square gof test and comparing it to the critical value. I think the chi2gof() function is the correct tool but I am having trouble figuring it out. Here is my code:

x=[9.1490 8.3627 7.8294 8.0733 7.9666 8.2909 7.9900 8.6333 8.4093 9.0143 9.7085 8.2009 8.1845 8.3145 8.3151 8.6150 8.1795 8.1972 7.9855 8.3808 8.7189 8.4584 8.0842 8.4583 8.5127 9.2456 8.0110 8.0434 8.8320 8.2386 8.9114];

paramhat=gevfit(x);

a1=paramhat(1);
a2=paramhat(2);
a3=paramhat(3);

[h,p,st] = chi2gof(x,'cdf',@(z)gevcdf(z,a1,a2,a3),'nparams',3);

The above expression is not giving me an answer that I expect. Thank you in advance for any help/suggestions.

Peter Perkins

unread,
Oct 14, 2010, 2:45:42 PM10/14/10
to
On 10/13/2010 4:34 PM, Bryce Cruey wrote:
> I am new to matlab statistics toolbox and am trying to figure out how to
> use the chi2gof() function. I have a dataset that I have fit a
> generalized extreme value distribution to (GEV). I want to test the
> goodness of fit by generating the test statistic for the chi-square gof
> test and comparing it to the critical value. I think the chi2gof()
> function is the correct tool but I am having trouble figuring it out.

Bryce, when I run your code, I get this:

paramhat =
0.13364 0.28932 8.2206
h =
0
p =
NaN
st =
chi2stat: 0.24968
df: 0
edges: [7.8294 8.2052 8.3931 9.7085]
O: [12 6 13]
E: [10.795 6.8514 13.354]

So the p-value is NaN. The df and edges fields of that stats structure
tell you what's going wrong: chi2gof tries to use 10 bins by default,
but collapses bins at the extremes until it gets at least 5 values in
each bin. You've only got 31 observations, and what happens is that it
ends up collapsing down to only three bins. With three estimated
parameters, you've got (less than) no degrees of freedom left.

That explains the NaN. So what to do?

You can choose your own bin edges to override the automatic procedure.
But you don't have a lot of data, and you'll need at least 5 bins to get
d.f. > 0. You can have at most 6 bins and still have at least 5
observations in each bin. Something like edges = gevinv([0 .2 .4 .6 .8
1],a1,a2,a3) might be useful. There are statistical issues that I am
not entirely up on when you choose bin edges like this.

You can specify a lower minimum bin count. 5 is usually considered the
smallest for anything like an approximate chi-squared dist'n for the
test stat.

You could use the K-S test, and ignore the fact that you're estimating
parameters and the K-S p-value is based on the assumption that you have
not estimated the parameters. If it rejected, you could take that as a
conservative test (i.e., the p-value will be too large, and so if it
rejects, you have even more evidence against the null than you would
ordinarily). But with your data, it won't reject.

You could use DFITTOOL to look at a CDF plot, and see that the fit is
almost too good to be true (which makes me wonder if these data are
real), and skip a formal hypothesis test altogether.

Hope this helps. This kind of distribution GOF testing seems like a
simple thing, but is way more difficult theoretically than it seems.

0 new messages