t distribution -- sample kurtosis

mmo...@yahoo.com

unread,

Nov 6, 2005, 12:26:21 AM11/6/05

to

I was trying to do some Monte Carlo simulation in Excel, and decided to
verify that my random numbers generator was working properly. The
generator was set to t-distribution with df=5 degrees of freedom. I
created a sample of size 30000, and was surprised that the sample
kurtosis was WAY off. Its theoretical value is 6 (for kurtosis excess),
while I was getting (in the first 10 trials): 3.16, 3.77, 5.31, 4.08,
4.50, 4.41, 4.14, 5.40, 3.44, 5.41.

Why is it? Is the sample kurtosis so skewed that I can see so many
numbers that average to far below the population kurtosis? And is it
variance so large that on 30000-sized sample I can't estimate it even
to 1 significant digit?...

Thanks...

Art Kendall

unread,

Nov 6, 2005, 11:59:59 AM11/6/05

to

Excel is a spreadsheet. It is well known (see the archives of the
sci.stat.* newsgroups for details) to have problems with its random
number generator among other things when trying to use it as a
statistical package.

Try using a stat package. SPSS for example, has many random variable
functions.

the rv.t function "Returns a random value from a Student's t
distribution with specified degrees of freedom df."

below the sig block there is a tested set of SPSS syntax that generates
300,000 draws from a t distribution with 5 df. you can adapt it
according to the embedded comments. If you are using one of these two
PRNGs you should get the same results with the same seed.

Save all your current work, then open a new instance of SPSS. Make sure
that you put warnings, etc. into the output file. <edit> <options>
<viewer>.

Cut-and-paste then run the syntax after the sig block.

The details of all the algorithms are on the SPSS site www.spss.com

Art

new file.
* comment out one PRNG or the other.
*SET RNG=MC. /* multiplicative congruential PRNG.
SET RNG=MT. /*long period RNG with Mersenne twister.
* this program generates 300000 cases of a t distributed variable.
set seed = 20051106. /* you can change the seed.
input program.
loop #i = 1 to 300000. /* you can change the number of cases.
compute t= rv.t(5). /* you can change the distribution and/or the df.
end case.
end loop.
end file.
end input program.
formats t(f18.16).
descriptives vars= t /statistics=all.

Herman Rubin

unread,

Nov 6, 2005, 5:22:43 PM11/6/05

to

In article <1131254781.5...@o13g2000cwo.googlegroups.com>,

>Thanks...

The problem is in the distribution of the sample 4th moment.

As the 5th moment of the original distribution fails to exist,
the moment of order 1.25 of the 4th sample moment fails to
exist; I do not know the precision to be expected from a
sample of size N, but I would expect it to be possibly on
the order of N^{-1/8), and I do not know the constant here.
The 8th root of 30000 is less than 4.

I believe the limiting distribution would be an extremal
stable law of exponent 1.25. A careful computation might
verify all this. Also, too much uniformity in the random
numbers would tend to give too low values, but the ones
you have do not seem that unusual.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

shil...@yahoo.com

unread,

Nov 6, 2005, 10:01:58 PM11/6/05

to

Here is the results from SAS(pgm attached below).

Obs trail smplsize x1_Kurtosis

1 1 3000 4.252773920734340000
2 2 3000 5.669705958642340000
3 3 3000 7.832873142226450000
4 4 3000 2.172606579341830000
5 5 3000 7.294947522472560000
6 6 3000 7.294632050653970000
7 7 3000 4.534625069807540000
8 8 3000 4.265996602412060000
9 9 3000 3.494070058498530000
10 10 3000 2.488952591207550000
11 11 3000 3.689113894759370000
12 12 3000 3.281766308803870000
13 13 3000 17.20176982281560000
14 14 3000 8.077827555003220000
15 15 3000 3.022945068856240000
16 16 3000 4.154642466365760000
17 17 3000 2.482481208515620000
18 18 3000 2.725227027697530000
19 19 3000 4.020212545790490000
20 20 3000 6.329420583871910000
21 21 3000 1.496624552560120000
22 22 3000 5.674072144048940000
23 23 3000 2.585685606212640000
24 24 3000 2.613533387284080000
25 25 3000 2.739152515527020000

data random;
call streaminit(1234);
do trail=1 to 30;
do i=1 to 3000;
x1=RAND('T',5);
output;
end;
end;
run;

proc univariate data=random noprint;
class trail;
var x1;
output out=trail Kurtosis=x1_Kurtosis n=smplsize;
run;

proc print data=trail;format x1_Kurtosis 20.18;run;

mmo...@yahoo.com

unread,

Nov 6, 2005, 11:26:51 PM11/6/05

to

Thanks so much for all the replies!

Yes,

Looking at the SAS output, I notice that the skewness of the sample
kurtosis is really high, just like I expected. And the average of the
25 runs is ~4.77, which is ~1.23 below the true value of 6. The
standard deviation of the sample kurtosis, estimated from the SAS run,
is somewhere around 3.37. So if I want to have precision of, say, 10%
(standard deviation of 0.6) in the estimate of the 4th moment of my
distribution, I'd need a huge sample size of ~(3.37/0.6)^8 * 3000 ~ 3 *
10^9. Impossible in Excel, and pushing the limits of my PC and my
patience =)

It's hard for me to guess how the precision of the sample kurtosis
estimate translates into the quality of the Monte Carlo simulation
based on this distribution. But it does feel they are closely related.
I am beginning to think in this case the risk analysis (probability of
investment going to zero within a certain time, etc.) might be better
done using even the most rough closed-form approximation, rather than
the Monte Carlo method.

Oh, and I spent like 6 hours fighting Excel because my first thought
was also that Excel is wrong. I downloaded the best random number
generators, and invese functions, etc. I only made the post in this
group when I started to suspect it's my knowledge of statistics that's
malfunctioning rather than Excel =)

Thanks again!

Herman Rubin

unread,

Nov 7, 2005, 2:07:44 PM11/7/05

to

In article <1131337611....@g43g2000cwa.googlegroups.com>,

<mmo...@yahoo.com> wrote:
>Thanks so much for all the replies!

>Yes,

>Looking at the SAS output, I notice that the skewness of the sample
>kurtosis is really high, just like I expected. And the average of the
>25 runs is ~4.77, which is ~1.23 below the true value of 6. The
>standard deviation of the sample kurtosis, estimated from the SAS run,
>is somewhere around 3.37. So if I want to have precision of, say, 10%
>(standard deviation of 0.6) in the estimate of the 4th moment of my
>distribution, I'd need a huge sample size of ~(3.37/0.6)^8 * 3000 ~ 3 *
>10^9. Impossible in Excel, and pushing the limits of my PC and my
>patience =)

Even more, the standard deviation of the sample kurtosis
does not exist. For it to exits, the 8th moment of the
original distribution would have to exist, and it does
not. My previous conjecture that the sample fourth moment
is in the domain of attraction of the extremal stable
distribution of exponent 1.25 is correct. I have not done
any calculations on this, and they are not particularly
easy.

>It's hard for me to guess how the precision of the sample kurtosis
>estimate translates into the quality of the Monte Carlo simulation
>based on this distribution. But it does feel they are closely related.
>I am beginning to think in this case the risk analysis (probability of
>investment going to zero within a certain time, etc.) might be better
>done using even the most rough closed-form approximation, rather than
>the Monte Carlo method.

If high kurtosis is important, you are almost certainly
correct. Monte Carlo is rarely better than direct
computation in any case.

David Jones

unread,

Nov 8, 2005, 5:11:16 AM11/8/05

to

But it seems that estimating kurtosis is not important to the problem
being solved: "(probability of investment going to zero within a
certain time, etc.) ". If all the OP is trying to do is test that the
random numbers being generated do accord with the distribution he/she
is trying to use, then other tests can be used for this, including
chi-squared via binning, etc. If something specific about the "shape"
of the distribution is needed, then L-kurtosis (see references on
L-moments) is a possible statistic that does not suffer the same
restictions for existence of moments as does kurtosis.

If the alternative to Monte Carlo that the OP has is only a "rough
closed-form approximation", then it is likely that he/she would need
to do at least some Monte Carlo simulations to check the adequacy of
the approximation"

David Jones

J. R. M. Hosking

unread,

Nov 9, 2005, 3:23:43 PM11/9/05

to

Sample kurtosis is not a reliable estimator of population kurtosis
for distribution with heavy tails. You will get better results if
you use L-kurtosis, which is a kurtosis measure based on ratios of
linear combinations of order statistics and is less affected by
extreme outliers than is the usual kurtosis. See my paper in
J.R.Statist.Soc.B, 1990, vol. 52, pp.105-124.

Sample L-kurtosis is approximately unbiased and Normally distributed
even for quite heavy-tailed distributions. There is an example
(t distribution with 4.65 degrees of freedom, sample size 1000)
in Figure 5 of my research report "L-moments and their applications
in the analysis of financial data", which you can find at

http://domino.research.ibm.com/library/cyberdig.nsf/
papers?SearchView&Query=RC21466&SearchMax=100

(ignore line breaks in the URL)

J. R. M. Hosking
hos...@watson.ibm.com