How to use Monte Carlo simulation to obtain propotion confidence interval for Bernoulli events with unequal probabilities?

Lixiang Xiong

unread,

Jun 25, 2009, 6:32:01 AM6/25/09

to

Probably this question is not so related to Matlab itself, however, I know many talented mathematicians are here and I hope I can get some helps from you guys.

I am doing some work to calculate the confidence interval for a Poisson-binomial distribution. This distribution has been rarely mentioned even by professional statisticians. Suppose N independent Bernoulli trials exist and two results exist for each trial: ``pass'' or ''fail". Now K of N trials have been observed as ``pass". If an identical ``pass" probability is applied to all trials, K will follow a well-studied Binomial distribution. If different ``pass" probabilities are applied, K will follow a rarely known Poisson-binomial distribution. Obviously, Binomial distribution is a special but simple version of Poisson-binomial distribution.

Now I have obtained a K value from a set of N independent Bernoulli trials with unequal ``pass" probabilities (assume these probabilities are pre-known). As defined above, I know K will follow a Poisson-Binomial distribution. Due to resource limit, I cannot repeat this set of N trials. Therefore, I only have such a single K value plus the N value, and I want to know the 95% confidence interval of the ``pass" proportion (K/N).

So far I do some literature review work on it. I am aware that some approximations can be done to such a Poisson Binomial distribution, such as those in [1] [2]. However, it appears that such approximations only work under some limitations.

Therefore, I wonder whether and how I can do Monte Carlo simulation to calculate the confidence interval for this Poisson-binomial distribution. I do have some experience in Monte Carlo simulation before, and I know I can obtain a large amount of K simulation values. But I really have no idea how to deal with these simulation results to obtain the confidence interval based on the single K value from the real trials.

Any advice will be greatly appreciated.

Cheers

Lixiang

[1]Werner Ehm, Binomial approximation to the Poisson binomial distribution, Statistics & Probability Letters, 1991, vol. 11, issue 1, pages 7-16
[2] Lucien Le Cam, An approximation theorem for the Poisson binomial distribution, Pacific J. Math. Volume 10, Number 4 (1960), 1181-1197.

Lixiang Xiong

unread,

Jul 8, 2009, 2:35:04 AM7/8/09

to

I think I figure out it by myself.

The following Monte Carlo confidence interval methodology is originally from Buckland's work (Buckland, S. T., "Monte Carlo methods for confidence interval estimation using the bootstrap technique", Journal of Applied Statistics, Vol 10, Issue 2, 1983, pages 194-212.

Suppes n independent sets of Monte Carlo simulation are implemented, and accordingly n values of the interested parameter p have been obtained.

Such n values of p are ordered from the smallest (p_1) to the largest (p_n). The 100(1-2a)% confidence interval can be obtained by
L=p_j, j=(n+1)a, (1)
U=o_k, k=(n+1)(1-a). (2)
Here L and U are the lower bound and the upper bound of 100(1-a) confidence interval, respectively.

It should be noted that the calculated j and k may not be integers. However, this problem can be easily solved by rounding them to the nearest intergers.

Example: total 100 sets of Monte Carlo simulation are implemented and 100 vaules of p are obtained. For the popular 95% confidence interval, a=0.025 according to 100(1-2a)%=95%.

According to equations (1) and (2), j and k can be calulated as
j=(100+1)*0.025=2.525=3 (approximately)
k=(100+1)*(1-0.024)=98.475=98(approximately).

Thus, the 95% confidence interval can be expressed as L=p_3, and U=p_98.

Peter Perkins

unread,

Jul 8, 2009, 10:37:14 AM7/8/09

to

Lixiang Xiong wrote:
> I think I figure out it by myself.
>
> The following Monte Carlo confidence interval methodology is originally from Buckland's work (Buckland, S. T., "Monte Carlo methods for confidence interval estimation using the bootstrap technique", Journal of Applied Statistics, Vol 10, Issue 2, 1983, pages 194-212.
>
>
> Suppes n independent sets of Monte Carlo simulation are implemented, and accordingly n values of the interested parameter p have been obtained.
>
> Such n values of p are ordered from the smallest (p_1) to the largest (p_n). The 100(1-2a)% confidence interval can be obtained by
> L=p_j, j=(n+1)a, (1)
> U=o_k, k=(n+1)(1-a). (2)
> Here L and U are the lower bound and the upper bound of 100(1-a) confidence interval, respectively.

If I'm reading you description correctly, this is known as the percentile method, and is not considered a very good way to get confidence intervals. You might want to look into using the BOOTCI function in the Statistics Toolbox, which implements a couple algorithms that are considered much better.

Hope this helps.

tonerroc...@gmail.com

unread,

Apr 22, 2014, 6:09:42 PM4/22/14

to