I have the following two schemes for obtaining 95% confidence interval from
Monte Carlo simulations of f(x1, ..., xk) where x1, ..., xk are random
numbers.
Approach 1:
Simulate N=10000 times,
obtain f1, ..., f_N,
obtain the sample mean of f1, ..., f_N to be mean_f,
obtain the sample standard deviation of f1, ..., f_N to be std_f,
and the 95% CI is [mean_f - 1.96 * std_f /sqrt(N), mean_f + 1.96 * std_f
/sqrt(N)].
Approach 2:
Repeat the following procedure M=1000 times:
{
Simulate N=10000 times,
obtain f1, ..., f_N,
obtain the sample mean of f1, ..., f_N to be mean_f
}
After the above repetitions I will have mean_f_1, mean_f_2, mean_f_3, ... ,
mean_f_M,
then I obtain the sample mean of the above M means, to be mean_mean_f,
obtain the sample standard deviation of mean_f_1, ..., mean_f_M to be
std_mean_f,
and the 95% CI is [mean_mean_f - 1.96 * std_mean_f /sqrt(M), mean_mean_f
+ 1.96 * std_mean_f /sqrt(M)].
-------------------------------------
Which one of the above is a correct procedure?
Thanks a lot!
>
> Approach 2:
>
> Repeat the following procedure M=1000 times:
> {
> Simulate N=10000 times,
> obtain f1, ..., f_N,
> obtain the sample mean of f1, ..., f_N to be mean_f
> }
>
So mean_f will be the mean based on 1000*N simulations, rather than N
simulations.
> After the above repetitions I will have mean_f_1, mean_f_2, mean_f_3, ... ,
> mean_f_M,
> then I obtain the sample mean of the above M means, to be mean_mean_f,
> obtain the sample standard deviation of mean_f_1, ..., mean_f_M to be
> std_mean_f,
So, this is the standard deviation of the mean of f, when the mean of f
is estimated from N data points.
> and the 95% CI is [mean_mean_f - 1.96 * std_mean_f /sqrt(M), mean_mean_f
> + 1.96 * std_mean_f /sqrt(M)].
>
So this is the CI for the mean of f, not for f itself.
>
> -------------------------------------
>
> Which one of the above is a correct procedure?
>
Depends: do you want the CI for f, or for the mean of N samples of f?
If the former, then the first is correct.
Incidentally, I would simply use the quantiles of the N samples of f to
get the confidence interval. That way, if f is not normally
distributed, you still get results that make sense.
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
This assumes that f is normally distributed. and the standard deviation is the standard deviation of f. Hence, the Ci is the CI for f
Approach 2:
Repeat the following procedure M=1000 times Simulate N=10000 times, > obtain f1, ..., f_N, > obtain the sample mean of f1, ..., f_N to be mean_f> }So mean_f will be the mean based on 1000*N simulations, rather than N simulations.
After the above repetitions I will have mean_f_1, mean_f_2, mean_f_3, ... , mean_f_M, > then I obtain the sample mean of the above M means, to be mean_mean_f, obtain the sample standard deviation of mean_f_1, ..., mean_f_M to be std_mean_f, So, this is the standard deviation of the mean of f, when the mean of f is estimated from N data points. and the 95% CI is [mean_mean_f - 1.96 * std_mean_f /sqrt(M), mean_mean_f + 1.96 * std_mean_f /sqrt(M)]. So this is the CI for the mean of f, not for f itself.
Which one of the above is a correct procedure?
Depends: do you want the CI for f, or for the mean of N samples of f? If the former, then the first is correct. Incidentally, I would simply use the quantiles of the N samples of f to get the confidence interval. That way, if f is not normally distributed, you still get results that make sense. Bob ***
My response
ABSOLUTELY CORRECT what you wrote, Bob. A simple (unnecessary?) note:
__No need to order the values found (a very time consuming job). EACH ONE must be CODED in order to be MEMORIZED. The last job is to sum the CONTENT MEMORIES (from the extreme left memory position just till the desired quantile(s) is (are) obtained.
______licas (Luis A. Afonso)
********* Date: Mar 7, 2007 7:04 PM
Subject: Re: Jarque-Bera test: confidence intervals for normal data
These are not confidence intervals. Jack *********
My response
Jack Tomsky is SURELY NOT a Statistician.
______licas (Luis A. Afonso)
Mike,
With all due respect, I don't see the point to this. If your sample
size is N = 10,000 then what are you going to learn?
The way to Monte Carlo conf. intervals is to (1) generate a small
sample of "data"; (2) calculate appropriate statistics (average and
sample std. dev. if you are dealing with a conf. interval on a mean);
(3) calculate the proposed confidence interval; (4) ask whether this
interval does or does not encompass the "true" value of the parameter
(such as the true mean); keep a record of how many times you did
indeed capture or encompass the value of that parameter; (5) do this
many times (at least 10,000 times and then (6) record the frequency or
percent of the time your intervals captured or encompassed the true
value of the parameter. That frequency (percentage) is, say, "95% of
the time" In which case you are generating "95% confidence
intervals." From this you can conclude "The probability is 0.95 that
intervals calculated in this manner will encompass the true value of
the underlying parameter."
Ignore Afonso. He's the court jester around here. OMU
On Mar 24, 2:41 am, "Mike" <housing2...@gmail.com> wrote: > Hi all, > I have the following two schemes for obtaining 95% confidence interval from> Monte Carlo simulations of f(x1, ..., xk) where x1, ..., xk are random numbers.
Approach 1: Simulate N=10000 times> obtain f1, ..., f_N, > obtain the sample mean of f1, ..., f_N to be mean_f, > obtain the sample standard deviation of f1, ..., f_N to be std_f, > and the 95% CI is [mean_f - 1.96 * std_f /sqrt(N), mean_f + 1.96 * std_f> /sqrt(N)]. >Approach 2: > Repeat the following procedure M=1000 times: Simulate N=10000 times obtain f1, ..., f_N, > obtain the sample mean of f1, ..., f_N to be mean_f> > After the above repetitions I will have mean_f_1, mean_f_2, mean_f_3, ... , mean_f_M, then I obtain the sample mean of the above M means, to be mean_mean_f, > obtain the sample standard deviation of mean_f_1, ..., mean_f_M to be> std_mean_f, and the 95% CI is [mean_mean_f - 1.96 * std_mean_f /sqrt(M), mean_mean_f + 1.96 * std_mean_f /sqrt(M)]. > ------------------------------------->> Which one of the above is a correct procedure? > Thanks a lot! Mike,
OMU wrote:
With all due respect, I don't see the point to this. If your samplesize is N = 10,000 then what are you going to learn? The way to Monte Carlo conf. intervals is to (1) generate a smallsample of "data"; (2) calculate appropriate statistics (average and
sample std. dev. if you are dealing with a conf. interval on a mean); (3) calculate the proposed confidence interval; (4) ask whether this
interval does or does not encompass the "true" value of the parameter (such as the true mean); keep a record of how many times you didindeed capture or encompass the value of that parameter; (5) do thismany times (at least 10,000 times and then (6) record the frequency orpercent of the time your intervals captured or encompassed the truevalue of the parameter. That frequency (percentage) is, say, "95% of the time" In which case you are generating "95% confidence intervals." From this you can conclude "The probability is 0.95 that intervals calculated in this manner will encompass the true value of the underlying parameter." Ignore Afonso. He's the court jester around here. OMU
My response
1) OMU doesn’t see what your goal is because hr NEVER intended to work in MONTE CARLO. For him ALL the procedures stat from a REAL SAMPLE, otherwise he claims that it is a fraud.
2) As I understood Mike intend to find a CONFIDENCE INTERVAL for the mean of random numbers (from a UNIFORM distribution?).
If is this the matter,
a) A sample size, N, should be chosen,
b) Simulate a sample, sized N, and evaluate its MEAN value m, codify it by Int(1000*m + 0.5) and memorize,
c) Repeat b) says 1million times.
After this is done:
d) SUM from the 0 memory upwards their contents till you JUST its SUM surpasses the k*1million. Then, positively, you got the k Quantile of the Distribution of the mean values.
Example: putting k(1)=0.025 and k(2)=0.975 you got a 95% CONFIDENCE INTERVAL with the left tail = right tail with 2.5% probability. Justly
______[ k(1), k(2)]
In short
_____ OMU´S IS UNABLE TO SHOW US CONFIDENCE INTERVALS HE GOT BY MONTE CARLO, and what I just had shown several times in this News that I got them (theoretically confirmed, if possible).
THIS CONFIRMS HE IS AN OLD CLOWN.
________________________
Mike, let´s make an *experiment*
Firstly try to follow OMU´s words and find the result you intend,
Then
Follow mine.
I, will find the quantiles for 95% and 99% Confidence Intervals. (a couple of days to post the results).
_______licas (Luis A. Afonso)
__Size____
__10_____0.322, 0.678________0.270, 0.731
__11_____0.330, 0.670________0.280, 0.721
__12_____0.338, 0.663________0.289, 0.712
__13_____0.344, 0.656________0.297, 0.703
__14_____0.349, 0.651________0.304, 0.696
__15_____0.354, 0.646________0.310, 0.689
Thought
In order to assure University’s dignity people to whom PhD’s (Stanford, Jack Tomsky) was granted should be periodically submitted to exam in order to check if they are not senile and pay attention to what is happening in literature.
*** Wagons do not stop at the dog’s barking. If reality makes you uncomfortable, better to see a doctor.***
_____licas (Luis A. Afonso)
If these are confidence intervals, what is the unknown parameter? What is the sample? A confidence interval is a pair of random numbers depending on the sample. Are your confidence intervals the same for all samples?
Jack
__Size____
__15_____0.354, 0.646________0.310, 0.689
__16_____0.359, 0.641________0.316, 0.684
__17_____0.363, 0.637________0.322, 0.678
__18_____0.367, 0.633________0.327, 0.674
__19_____0.370, 0.630________0.331, 0.669
__20_____0.374, 0.626________0.335, 0.665
A *necessary* condition that the statistics mean of a sample of size 20 be originated from a uniform [0. 1] distribution is that should be in [0.374, 0.626] with a significance 5%. If this is relaxed to 1% the interval would be [0.335, 0.665].
_____licas (Luis A. Afonso)
So these are simply probability intervals on the sample mean. It has nothing to do with confidence intervals.
Jack
__The null hypothesis for this test is that [the error] is normally distributed
__Like most statistical tests, this test of normality defines a criterion and gives its sampling distribution.
__and consequently Lilliefors decided to calculate approximately the sampling distribution by using [1967] the Monte-Carlo technique. Essentially the procedure consists of extracting a large number of samples from a Normal Population and computing the value of the criterion for each of theses samples. The empirical distribution of the values of the criterion gives an approximation of sampling distribution of the criterion under the null hypothesis.
And my comment
__ Since June 1967 (H. Lilliefors, On the Kolmogorov-Smirnov test for normality with mean and variance unknown) Jack Tomsky did not find time to read a so important paper published by the ASA´s Journal!!!__
______licas (Luis A. Afonso)
You have claimed that the numbers you presented are 95% and 99% confidence intervals. Alright, I'll take you at your word.
A confidence interval is calculated from a sample and has the property that it covers the true value of the parameter at least a specified percentage of the time(e.g., 95% or 99%) for all values of the parameter.
Let's take your "95% confidence interval" for N = 10. You reported this as 0.322 to 0.678 for all samples. If the value of the parameter is between 0.322 and 0.678, then your interval covers the true value 100% of the time.
If the value of the parameter is either less than 0.322 or greater than 0.678, then your interval covers the true value 0% of the time. Thus, your actual confidence level is 0%.
Lilliefors had the good sense to avoid referring to his tables as confidence intervals.
Jack
YOU ARE ABSOLUTELY WRONG, JACK, the interval was constructed such that 95% of all means is contained in it: CONSTRUCTED as two 2.5% tails were left outside it at each side, which gives a 5% significance level. HOW MANY TIMES SHOULD I STATE THIS?
WHY you do not try to publish your *interpretation* against all it is stated in literature since Lilliefors discovered that could obtain a complete distribution by means of simulation methods (Monte Carlo)? Do it, it’s the THIRD time I invite you.
A plethora of scientific papers were published in serious journals using his method in the last FOURTY years. Were they ALL wrong, the authors and the referees as well? WHERE THEY?
Why you do not be quite?
________licas (Luis A. Afonso)
I constructed the interval such that:
___2.5% if the means belong to [0, 0.322)
___95% to [0.322, 0.678]
___2.5% to (0.678, 1]
.
Note that it is 0.500 +/- 0.178, all intervals are centred at 0.5.
________licas (Luis A. Afonso)
_____licas (Luis A. Afonso)
I'm only quoting Luis A. Afonso when he called these numbers "confidence intervals". Are you now saying that Luis A. Afonso was mistaken?
> WHY you do not try to publish your *interpretation*
> against all it is stated in literature since
> Lilliefors discovered that could obtain a complete
> distribution by means of simulation methods (Monte
> Carlo)? Do it, it’s the THIRD time I invite you.
> A plethora of scientific papers were published in
> serious journals using his method in the last FOURTY
> years. Were they ALL wrong, the authors and the
> referees as well? WHERE THEY?
> Why you do not be quite?
>
In 1967, I had my programmer develop a composite test of normality using the KS distance metric by Monte Carlo simulation in FORTRAN. We wrote up these tables as an internal company report. The approach was so trivial that the idea of getting it published in an outside journal would have been embarassing.
If later, someone at one of the annual profession meetings had said to me, "Aren't you famous for the Tomsky normality test?", I would have said, "You must be thinking of someone else with the same name."
Jack
> ________licas (Luis A. Afonso)
Given a N(0,1) sample size 1000 at random there are a 95% probability that its mean will belong to the interval
____[-sqrt(1000)*1.960, sqrt(1000)*1.960]____?
Suppose that in Mars this theorem is not known but they have a way to simulate normal samples.
They made 100 trillion of samples of size 1000 and then they found the 0.025 quantile of the means and the 0.975 one. What you expected the Martians found out as interval bounds?
This is useful for what?
EVERYTIME they have a sample of 1000 items and they calculate its mean if the value falls outside the indicated bounds it is suspected that something is violated towards the model. For example:
____the items are not independent
____the distribution, even normal, doesn’t have variance 1,
____the distribution is NOT normal
____or something else had happened to be analyzed.
_______.licas (Luis A. Afonso)
*** In 1967, I had my programmer develop a composite test of normality using the KS distance metric by Monte Carlo simulation in FORTRAN. We wrote up these tables as an internal company report. The approach was so trivial that the idea of getting it published in an outside journal would have been embarassing.
If later, someone at one of the annual profession meetings had said to me, "Aren't you famous for the Tomsky normality test?", I would have said, "You must be thinking of someone else with the same name." ***
My response
If the episode you tell us is in fact true (to what I am not sure) it say us that at the tine you was wise (however nowadays you get worse).
Your confusion between the Kolmogorov - Smirnov and Lilliefors test is very very enlighten of your incapacity to see the difference (unlearned and empty)
The difference is ... As al the Readers are full aware I do not explain this time. I has yet do.
______licas (Luis A. Afonso)
Your probability of 95% is wrong. The limits you gave are =/- 61.98 and the probability of xbar being within these limits is close to 100%.
Jack
You are very confused about the distance metric used in the Lilliefors test. It's a test for the composite null hypothesis that the true distribution belongs to the normal family of distributions. The metric is the KS distance metric between the empirical and the fitted normal cdfs. Read the Lilliefors paper again.
Jack
Under the Afonso theory of hypothesis testing, one is never allowed to accept either the null hypothesis or the alternative hypothesis. In particlar, one is never allowed to accept Ho: 8/13 > 5/13. Never!
Jack
The test is absolutely DIFFERENT (the use of the same algorithm is unimportant).
K-S test is constructed to compare data with a Normal Distribution that is DEFINED at the start and is useless for a single unknown sample.
On contrary the Lilliefors´ is true useful: the parameters are found from the sample.
(I’m somewhat embarrassed to be forced to write this because it is ELEMENRTARY)
_____licas (Luis A. Afonso)
Jack,
You are correct, of course.
But you-know-who will fight to death to be wrong.
I've not yet read the many posts beyond this point.
OMU
Luis... You are a very confused puppy. OMU
I am embarrassed taht you were not able to recognize that Lilliefors used the KS metric for the difference between the empirical cdf and the fitted normal cdf; namely, the maximum absolute difference between the two cdfs. Therefore I'm not convinced that you ever read the paper.
Jack
The test is absolutely DIFFERENT (the use of the same algorithm is unimportant). ***
My response
1) Are you so senile that you do not understand your own NATIVE LANGUAGE (see above)?
2) I did read the paper since it appeared on ASA.
2a) A colleague and I made a program in order to have the CRITICAL VALUES for the sizes 10(1)30, when working in my SCIENTIFIC INSTITUTE.
2b) I repeated the evaluations, now as retired, and posted the results in this NEWS.
3) In view the *knowledge* you have shown I THINK that your Ph. D. thesis was purchased, copied, or simply made by a friend of yours.
Finally
4) The paper has not the algorithm in mathematical formulation!!!!!!!!!!!
It is REALY odd Jack´s claim that I didn’t read the paper: to find out what was the *distance* I had to consult my own notes, and confirm throughout the available Literature.
_______licas (Luis A. Afonso)
Are you saying that your distance metric was different from Lilliefors and Tomsky?
Jack
_______licas (Luis A. Afonso)
If you're incapable of recognizing that Lilliefors used the KS distance metric, it shows that you read the paper without understanding what you read. And that's your favorite paper in the literature. It makes one wonder if there is any paper that you understand.
Jack
*** If you're incapable of recognizing that Lilliefors used the KS distance metric, it shows that you read the paper without understanding what you read. And that's your favorite paper in the literature. It makes one wonder if there is any paper that you understand. Jack ***
MY RESPONSE
Check (below) what was the algorithm I used.
What is extraordinary is your claim that you
performed similar job years ago. Changing
your mind since that date? If the values are worthless
why did you performed the calculations? This is no
sense, AT ALL! (I am not surprised , even a bit).
______licas (Luis A. Afonso)
REM "LiLLI"
CLS
PRINT " LILLI(EFORS) "
INPUT " n = "; n
INPUT " all = "; all
pi = 4 * ATN(1): c = 1 / SQR(2 * pi)
DIM x(n), xx(n), F(n)
DIM w(9001)
DEF fng (z, j) = -.5 * z ^ 2 * (2 * j + 1) /
((j + 1) * (2 * j + 3))
F(0) = 0
FOR ji = 1 TO n: F(ji) = ji / n: NEXT ji
FOR k = 1 TO all: RANDOMIZE TIMER
LOCATE 5, 50:
PRINT USING "##########"; all - k
mmaior = -1: md = 0: soma2 = 0
FOR i = 1 TO n
a = SQR(-2 * LOG(RND))
x(i) = a * COS(2 * pi * RND)
md = md + x(i) / n
soma2 = soma2 + x(i) * x(i)
NEXT i
sqd = soma2 - n * (nmd) ^ 2
: sd = SQR(sqd / n)
FOR ii = 1 TO n
x(ii) = (x(ii) - md) / sd
NEXT ii
FOR ii = 1 TO n: u = x(ii): w = 1
FOR jj = 1 TO n
IF x(jj) < u THEN w = w + 1
NEXT jj: xx(w) = u
NEXT ii
FOR tt = 1 TO n: z = xx(tt)
REM calcula FI(z)
IF z > 0 THEN kw = 0
IF z < 0 THEN kw = 1
zu = ABS(z): s = c * zu: antes = c * zu
FOR j = 0 TO 1000
xx = antes * fng(zu, j)
s = s + xx
antes = xx
IF ABS(xx) < .00005 THEN GOTO 20
NEXT j
20 IF kw = 0 THEN ff = .5 + s
IF kw = 1 THEN ff = .5 - s
b = ABS(ff - F(tt - 1))
bb = ABS(F(tt) - ff)
maior = b
IF bb > b THEN maior = bb
IF maior > mmaior THEN mmaior = maior
NEXT tt
mm = INT(1000 * mmaior + .5)
IF mm > 9000 THEN mm = 9000
w(mm) = w(mm) + 1
fff = INT(k / 50000): ff = k / 50000
IF ff <> fff THEN GOTO 1000
cc(1) = .95 * k: cc(2) = .99 * k
FOR iji = 1 TO 2
ciji = cc(iji): s = 0
FOR iij = 0 TO 9000
s = s + w(iij)
IF s > ciji THEN GOTO 100
NEXT iij
100 PRINT USING "##.### #.#### ";
iij / 1000; s / k
NEXT iji
1000 NEXT k
END
See, you used the same distance metric that I and Lilliefors used. That shows up when you used max(b,bb) as your distance metric.
Jack
My response
***What a *BIG* surprise, INDEED!!!
_______licas (Luis A. Afonso)
With 1 million of observations and z = 2.5 (i.e. F(z )= 98.76%):
Quantile 0.975
Confidence Interval __[0.9746, 0.9754]
Quantile 0.995
Confidence Interval__[0.9948, 0.9952]
(from Connover, Practical nonParametrical Statistics)
_____licas (Luis A. Afonso)
What is z? What is F? What is the sample size? What are the confidence levels?
Jack
I don't think you understood what you wrote. You just copied something out of a book. That's why you can't answer my simple questions.
Jack
______licas (Luis A. Afonso)
I don't think you know anything about confidence intervals. You can't even explain what you copied out of Connover's book.
You can't even explain Lilliefors paper. You said that Lilliefors didn't use the KS distance metric, yet you copied his formulas in your code and the KS distance metric was right there in your code.
Jack
_______licas (Luis A. Afonso)
No, Luis, you are the one who is confused. And you are spreading
confusion to the poor souls who came here for advice.
"The more you stir it, the worse it stinks."