Skewness and kurtosis p-values

Cristiano

unread,

May 24, 2013, 1:39:15 PM5/24/13

to

I calculate the skewness and the kurtosis from a set of real numbers
(distribution unknown) using the formulas:

http://mvpprograms.com/help/mvpstats/distributions/SkewnessCriticalValues

http://mvpprograms.com/help/mvpstats/distributions/KurtosisCriticalValues

I usually need to check whether the calculated skewness and kurtosis are
in good agreement with the expected values for a normal or uniform
distribution; I need a p-value.

I'm trying to replicate (via simulation) the p-values (alpha) presented
in that site, but I get different values. For example, for n= 7 and
alpha= 0.1, for the skewness I get 1.169 instead of 1.307.

For the skewness I do the following:
1) generate a random number x_i in N(0,1)
2) if x_i < 0 discard the number
3) for n= 7 I do the above steps until i = 1428571
4) calculate the 95th percentile (for alpha= 0.1) of the x's.

Does anybody know where I could be wrong?

Thank you
Cristiano

Rich Ulrich

unread,

May 24, 2013, 3:32:59 PM5/24/13

to

On Fri, 24 May 2013 19:39:15 +0200, Cristiano <cris...@NSgmail.com>
wrote:

My tentative guess is that you cut-and-paste'd your
steps from some wrong source.

Discarding negative numbers has nothing to do with
computing skewness, so far as I can imagine.

Somewhere in the steps, you should "compute skewness."

1) Draw 7; compute skewness; save.
2) Repeat 100,000 times.
3) Show 5% and 95% points (should be nearly the same
absolute values).
3) Repeat 10 times.

--
Rich Ulrich

Cristiano

unread,

May 24, 2013, 6:21:25 PM5/24/13

to

On 24/05/2013 21:32, Rich Ulrich wrote:
> On Fri, 24 May 2013 19:39:15 +0200, Cristiano <cris...@NSgmail.com>
> wrote:
>
>> I calculate the skewness and the kurtosis from a set of real numbers
>> (distribution unknown) using the formulas:
>>
>> http://mvpprograms.com/help/mvpstats/distributions/SkewnessCriticalValues
>>
>> http://mvpprograms.com/help/mvpstats/distributions/KurtosisCriticalValues
>>
>> I usually need to check whether the calculated skewness and kurtosis are
>> in good agreement with the expected values for a normal or uniform
>> distribution; I need a p-value.
>>
>> I'm trying to replicate (via simulation) the p-values (alpha) presented
>> in that site, but I get different values. For example, for n= 7 and
>> alpha= 0.1, for the skewness I get 1.169 instead of 1.307.
>>
>> For the skewness I do the following:
>> 1) generate a random number x_i in N(0,1)
>> 2) if x_i < 0 discard the number
>> 3) for n= 7 I do the above steps until i = 1428571
>> 4) calculate the 95th percentile (for alpha= 0.1) of the x's.
>>
>> Does anybody know where I could be wrong?
>
> My tentative guess is that you cut-and-paste'd your
> steps from some wrong source.

I wrote a C++ working program; I "extracted" the steps from there.

> Discarding negative numbers has nothing to do with
> computing skewness, so far as I can imagine.

The steps are a bit inaccurate.
I meant that I discard the skewness < 0.

> Somewhere in the steps, you should "compute skewness."
>
> 1) Draw 7; compute skewness; save.
> 2) Repeat 100,000 times.
> 3) Show 5% and 95% points (should be nearly the same absolute values).
> 3) Repeat 10 times.

Yes, I do that, but to be more precise:

1) Draw 7; compute skewness;

2) if skewness < 0 discard the value, else save.
3) Repeat 100,000 times.
4) Show 95% points.
5) Repeat until the confidence limit is good.

The reason to discard skewness < 0 is that I need to calculate only a
critical value for the skewness (the distribution must be exactly
symmetrical); if I get 5th percentile = -0.123 and 95th percentile =
.124, which critical value should I take?

Cristiano

Rich Ulrich

unread,

May 24, 2013, 11:50:21 PM5/24/13

to

On Sat, 25 May 2013 00:21:25 +0200, Cristiano <cris...@NSgmail.com>
wrote:

Depending on what you mean by "discard," this might
introduce some unknown bias. Do you keep the count?
There will never be *exactly* 50% of the sample with
skewness less than 0.

>3) Repeat 100,000 times.
>4) Show 95% points.
>5) Repeat until the confidence limit is good.

"good"? Mostly, I haven't seen formal statements for how the
precision was computed in similar MC studies. Often, people
show enough parallel results that the technical error is apparent,
but I like to look at the actual limits when I do the work.

>
>The reason to discard skewness < 0 is that I need to calculate only a
>critical value for the skewness (the distribution must be exactly
>symmetrical); if I get 5th percentile = -0.123 and 95th percentile =
>.124, which critical value should I take?

As you say, the distribution *ought* to be exactly symmetrical.

The lower limit provides a second value based on 100,000
replications. (1) Why ignore it? (2) If there were some bias
in your RNG that these computations brought out, it would be
important to know it. (3) When you compute 10 or 20 cut-offs,
you can compute a pragmatic standard error, to go along with
the theoretical one (based on ranks around the 5% cutoff).

Back when computers were 1000 times slower than today, I was
reading some computer science literature. Cutting an eight-hour
monte carlo job in half would have been a worth-while benefit of
using both ends of the distribution, even without the cross-check
on validity (from actually *looking* at both values). No reader
would have complained.

That all being said -- I don't know why your results don't agree
with the page you cite. Before I looked at the page, I
wondered at potential differences in definitions of "skewness".

However, they seem very explicit in what is being computed.
You *would* get slightly different results if you don't compute
the moments around the observed means for each set (but
assumed zero).

--
Rich Ulrich

Cristiano

unread,

May 25, 2013, 6:43:38 AM5/25/13

to

On 25/05/2013 5:50, Rich Ulrich wrote:
>> Yes, I do that, but to be more precise:
>> 1) Draw 7; compute skewness;
>> 2) if skewness < 0 discard the value, else save.
>
> Depending on what you mean by "discard,"

Uh? What I mean? Discard is discard; I mean discard.
You can take a look here:
http://www.thefreedictionary.com/discard
"To throw away; reject."

> this might introduce some unknown bias. Do you keep the count?
> There will never be *exactly* 50% of the sample with
> skewness less than 0.

Sure, but where's the problem?

>> The reason to discard skewness < 0 is that I need to calculate only a
>> critical value for the skewness (the distribution must be exactly
>> symmetrical); if I get 5th percentile = -0.123 and 95th percentile =
>> .124, which critical value should I take?
>
> As you say, the distribution *ought* to be exactly symmetrical.
>
> The lower limit provides a second value based on 100,000
> replications. (1) Why ignore it? (2) If there were some bias
> in your RNG that these computations brought out, it would be
> important to know it.

The RNG I use doesn't have any bias.
I checked that using properly designed tests and I check the simulation
using a properly designed generator.

> (3) When you compute 10 or 20 cut-offs,
> you can compute a pragmatic standard error, to go along with
> the theoretical one (based on ranks around the 5% cutoff).
>
> Back when computers were 1000 times slower than today, I was
> reading some computer science literature. Cutting an eight-hour
> monte carlo job in half would have been a worth-while benefit of
> using both ends of the distribution, even without the cross-check
> on validity (from actually *looking* at both values). No reader
> would have complained.

I don't have any problem in using both tails, but does it make any sense?
We already know that the critical values for the 5th and 95th percentile
*must* be exactly the same.
For example, using both tails I get:
0.05 -.82306 +/- 2.75e-4
0.95 .82311 +/- 2.73e-4
(+/- indicates the confidence interval)
The p-value have to come from a 2-sided test; there should be only one
critical value. Where's the sense in using -.82306 and .82311?

> That all being said -- I don't know why your results don't agree
> with the page you cite. Before I looked at the page, I
> wondered at potential differences in definitions of "skewness".

If it's not too much trouble, you just need to click the links to see
that the pages show also the formulas.

> However, they seem very explicit in what is being computed.
> You *would* get slightly different results if you don't compute
> the moments around the observed means for each set (but
> assumed zero).

I calculated the above critical values using mean= 0, while when I
calculate them using the sample mean I get:
0.05 -.81661 +/- 2.79e-4
0.95 .81637 +/- 2.74e-4
There are significant differences, but the values are very far away from
those tabulated in that site.
How that can be possible?
I'm not interested in using the values in the site, but I need to
understand whether my simulation works fine.

If someone can confirm that the following procedure is good, I can stop
asking and I can start the simulation:

1) Randomly draw N normally (or uniformly) distributed numbers
2) compute the skewness (or the kurtosis)
2a) [if skewness < 0 discard the value, else save]
3) Repeat many times
4) calculate the p-th percentile of the saved skewness (or kurtosis)
5) Repeat until the confidence interval for the p-th percentile is "good".

[I can calculate when "good" is good.]

Step 2a: for the kurtosis I need 2 critical values, but for the skewness
do I really need 2 critical values?

Cristiano

Cristiano

unread,

May 25, 2013, 7:57:37 AM5/25/13

to

On 25/05/2013 12:43, Cristiano wrote:
> If someone can confirm that the following procedure is good, I can stop
> asking and I can start the simulation:

Better idea: I checked my simulation with a counter-simulation.
I count how many calculated skewness fall beyond the critical values
calculated with my simulation. The results are very good.

I really don't know how that site gets those critical values.

Cristiano

David Jones

unread,

May 25, 2013, 12:36:23 PM5/25/13

to

"Cristiano" wrote in message news:knq8n3$gmc$1...@dont-email.me...

=======================================================

Probably, that site knows what a "two-sided test" means whereas, judging by
your description of simulation for the skewness, you do not. The simplest
change to your procedure would be to use the absolute value of the
calculated skewness, since that is the test statistic for a two-sided test
in this case. On the webpage, "alpha" is the total area of the two tails,
not just one tail.

You also said "Step 2a: for the kurtosis I need 2 critical values, but for
the skewness do I really need 2 critical values?". You do need two critical
values for the raw skewness, but for symmetric distributions you know that
these are related in a simple way. If you were working out a test of
skewness for some non-symmetric distribution, as is certainly possible,
there would be non-symmetric lower and upper limits for a two-sided test.

David Jones

Cristiano

unread,

May 25, 2013, 2:40:06 PM5/25/13

to

On 25/05/2013 18:36, David Jones wrote:
> Probably, that site knows what a "two-sided test" means whereas, judging
> by your description of simulation for the skewness, you do not.

I know what a "two-sided test" means (I wrote some 2-sided tests to test
RNG's), but I could be a bit confused in writing a simulation for a
2-sided test. Anyway, I don't think that it is very important. Here I'm
just trying to understand how they get those critical values because I
need to be sure that my simulation works fine.

> The simplest change to your procedure would be to use the absolute
value of
> the calculated skewness, since that is the test statistic for a
> two-sided test in this case. On the webpage, "alpha" is the total area
> of the two tails, not just one tail.

I know that (I saw the 2 red tails).

If I use the absolute value of the skewness calculated (many times) for
7 numbers in N(0,1) and I see that the 90th percentile is .8163, I would
argue that 90% of the times the |skewness| <= .8163. Am I wrong?
If I'm right, .8163 should be the critical values for their alpha= 0.1.
Even if I don't know anything about 2-sided tests, could someone tell
me, please, how in the earth they get 1.307?

Cristiano

David Jones

unread,

May 25, 2013, 4:43:08 PM5/25/13

to

"Cristiano" wrote in message news:knr09n$hha$1...@dont-email.me...

===========================================================

Have you tried finding an alternative source of critical values? Judging by
values in "Biometrika Tables" (which need to be adjusted for differences in
definition, and which give values only for n=25 upwards, and which may be
subject to some approximation error), the values for skewness on that
webpage seem right. You could check "Biometrika Tables" for details of how
they got their values (Pearson ES, Hartley HO (1969) Biometrika Tables for
Statisticians, Vol 1, 3rd Edition, Cambridge University Press), but it is
unlikely that the webpage used those methods.

Thus there is at least some evidence that the webpage is correct at least
for n>=25, and there that your programming is wrong somewhere. You could try
testing against published values for the variance of the skewness, which
would potentially avoid doubts about those webpage tables.

David Jones

Rich Ulrich

unread,

May 26, 2013, 7:07:30 PM5/26/13

to

On Sat, 25 May 2013 12:43:38 +0200, Cristiano <cris...@NSgmail.com>
wrote:

>On 25/05/2013 5:50, Rich Ulrich wrote:
>>> Yes, I do that, but to be more precise:
>>> 1) Draw 7; compute skewness;
>>> 2) if skewness < 0 discard the value, else save.
>>
>> Depending on what you mean by "discard,"
>
>Uh? What I mean? Discard is discard; I mean discard.
>You can take a look here:
>http://www.thefreedictionary.com/discard
>"To throw away; reject."
>
>> this might introduce some unknown bias. Do you keep the count?
>> There will never be *exactly* 50% of the sample with
>> skewness less than 0.
>
>Sure, but where's the problem?

Do you count it? "Throw away; reject" implies that
you will sample 100k values that are all positive, which
is clearly wrong. If you adapted by sampling 50k positive,
you will be wrong by the fraction off from 50%.

>
>>> The reason to discard skewness < 0 is that I need to calculate only a
>>> critical value for the skewness (the distribution must be exactly
>>> symmetrical); if I get 5th percentile = -0.123 and 95th percentile =
>>> .124, which critical value should I take?

Technically, you were talking about a two-tailed test, and assuming
(very rationally) that it has symmetrical tails. The technically
correct answer to the two-tailed limit is the absolute value that
rejects a total of 10% of the trials. For a particular randomization,
there will be more from one end than from the other. Okay. You get
a improved answer by using both ends together, instead of using
either (or both) separately.

>>
>> As you say, the distribution *ought* to be exactly symmetrical.
>>
>> The lower limit provides a second value based on 100,000
>> replications. (1) Why ignore it? (2) If there were some bias
>> in your RNG that these computations brought out, it would be
>> important to know it.
>
>The RNG I use doesn't have any bias.

I expect that that is (nearly) true. But I expect that a professional
RNG creator/tester would never lay out that statement without
some qualification, such as, "that woud be detected in an experiment
like this one."

>I checked that using properly designed tests and I check the simulation
>using a properly designed generator.
>
>> (3) When you compute 10 or 20 cut-offs,
>> you can compute a pragmatic standard error, to go along with
>> the theoretical one (based on ranks around the 5% cutoff).
>>
>> Back when computers were 1000 times slower than today, I was
>> reading some computer science literature. Cutting an eight-hour
>> monte carlo job in half would have been a worth-while benefit of
>> using both ends of the distribution, even without the cross-check
>> on validity (from actually *looking* at both values). No reader
>> would have complained.
>
>I don't have any problem in using both tails, but does it make any sense?
>We already know that the critical values for the 5th and 95th percentile
>*must* be exactly the same.
>For example, using both tails I get:
> 0.05 -.82306 +/- 2.75e-4
> 0.95 .82311 +/- 2.73e-4
>(+/- indicates the confidence interval)
>The p-value have to come from a 2-sided test; there should be only one
>critical value. Where's the sense in using -.82306 and .82311?

Here's a minor puzzle for me. Early, you were referring to the same
two-tailed limits (I think) as being about 1.2, not 0.82. Oh, well.

Anyway. As someone else suggests, the easiest check that
is readily available seems to be: Check what you get for N=25,
since that is where the usual tables start.

--
Rich Ulrich

Cristiano

unread,

May 27, 2013, 7:59:48 AM5/27/13

to

On 27/05/2013 1:07, Rich Ulrich wrote:
> On Sat, 25 May 2013 12:43:38 +0200, Cristiano <cris...@NSgmail.com>
> wrote:
>
>> On 25/05/2013 5:50, Rich Ulrich wrote:
>>>> Yes, I do that, but to be more precise:
>>>> 1) Draw 7; compute skewness;
>>>> 2) if skewness < 0 discard the value, else save.
>>>
>>> Depending on what you mean by "discard,"
>>
>> Uh? What I mean? Discard is discard; I mean discard.
>> You can take a look here:
>> http://www.thefreedictionary.com/discard
>> "To throw away; reject."
>>
>>> this might introduce some unknown bias. Do you keep the count?
>>> There will never be *exactly* 50% of the sample with
>>> skewness less than 0.
>>
>> Sure, but where's the problem?
>
> Do you count it? "Throw away; reject" implies that
> you will sample 100k values that are all positive, which
> is clearly wrong. If you adapted by sampling 50k positive,
> you will be wrong by the fraction off from 50%.

When I use only skewness >= 0, the only difference I see is the speed
(there is no difference in the critical values, as expected for
symmetrical distributions).

>>> As you say, the distribution *ought* to be exactly symmetrical.
>>>
>>> The lower limit provides a second value based on 100,000
>>> replications. (1) Why ignore it? (2) If there were some bias
>>> in your RNG that these computations brought out, it would be
>>> important to know it.
>>
>> The RNG I use doesn't have any bias.
>
> I expect that that is (nearly) true. But I expect that a professional
> RNG creator/tester would never lay out that statement without
> some qualification, such as, "that woud be detected in an experiment
> like this one."

I use the dSFMT PRNG (which comes with a sound quality proof) and I
checked its "randomness" using an improved version of RaBiGeTe.

>> I don't have any problem in using both tails, but does it make any sense?
>> We already know that the critical values for the 5th and 95th percentile
>> *must* be exactly the same.
>> For example, using both tails I get:
>> 0.05 -.82306 +/- 2.75e-4
>> 0.95 .82311 +/- 2.73e-4
>> (+/- indicates the confidence interval)
>> The p-value have to come from a 2-sided test; there should be only one
>> critical value. Where's the sense in using -.82306 and .82311?
>
> Here's a minor puzzle for me. Early, you were referring to the same
> two-tailed limits (I think) as being about 1.2, not 0.82. Oh, well.

I calculated those values using a "complicated" algorithm to reduce the
rounding errors, but I should have been wrong when I wrote the C++ code
for that algorithm.
Now I use the straightforward algorithm to calculate the skewness and I
get very similar results to those presented in the site.

Cristiano