[R] qqplot for binomial distribution

131 views
Skip to first unread message

Ashim Kapoor

unread,
Apr 17, 2017, 8:58:40 AM4/17/17
to r-h...@r-project.org
Dear All,

set.seed(123)
qqplot(rbinom(n=100,size=100,p=.05), rbinom(n=100,size=100,p=.05) )

I expect to see 1 clear line,but I don't. What am I misunderstanding?

Best Regards,
Ashim

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Spencer Graves

unread,
Apr 17, 2017, 9:02:06 AM4/17/17
to r-h...@r-project.org


On 2017-04-17 7:58 AM, Ashim Kapoor wrote:
> Dear All,
>
> set.seed(123)
> qqplot(rbinom(n=100,size=100,p=.05), rbinom(n=100,size=100,p=.05) )
>
> I expect to see 1 clear line,but I don't. What am I misunderstanding?


The distribution is discrete, and points are superimposed. Try
the following:


set.seed(123)
qqplot(jitter(rbinom(n=100,size=100,p=.05)),
jitter(rbinom(n=100,size=100,p=.05) ))


Spencer Graves

Ashim Kapoor

unread,
Apr 17, 2017, 9:15:54 AM4/17/17
to Spencer Graves, r-h...@r-project.org
Dear Spencer,

Okay. Many thanks. My next query is how do I use qqline?

When I try

> qqline(rbinom(n=100,size=100,p=.05))

I don't get the line in the right place.

Best Regards,
Ashim

Boris Steipe

unread,
Apr 17, 2017, 9:16:08 AM4/17/17
to r-help
Moreover, setting the seed once, then evaluating two functions means you are sampling from the same distributions, but you do in fact have different values. Outliers in the rarefied tails of the distribution may lie quite considerably off the expected diagonal. Try

set.seed(123)
qqplot(rbinom(n=1000, size=1000, p=0.05),
rbinom(n=1000, size=1000, p=0.05))

and you will find that you approximate the "1 clear line" quite well - for most of the values.


B.

Ashim Kapoor

unread,
Apr 17, 2017, 9:26:30 AM4/17/17
to Boris Steipe, r-help
Dear Boris,

Okay and Thanks.

Best,
Ashim

On Mon, Apr 17, 2017 at 6:45 PM, Boris Steipe <boris....@utoronto.ca>
wrote:

> Moreover, setting the seed once, then evaluating two functions means you
> are sampling from the same distributions, but you do in fact have different
> values. Outliers in the rarefied tails of the distribution may lie quite
> considerably off the expected diagonal. Try
>
> set.seed(123)
> qqplot(rbinom(n=1000, size=1000, p=0.05),
> rbinom(n=1000, size=1000, p=0.05))
>
> and you will find that you approximate the "1 clear line" quite well - for
> most of the values.
>
>
> B.
>
>
>
>
> > On Apr 17, 2017, at 9:01 AM, Spencer Graves <spencer.graves@

Boris Steipe

unread,
Apr 17, 2017, 10:21:32 AM4/17/17
to Ashim Kapoor, r-h...@r-project.org
That's not how qqline() works. The line is drawn with respect to a _reference_distribution_ which is the normal distribution by default. For the binomial distribution, you need to specify the distribution argument. There is an example in the help page that shows you how this is done for qchisq(). for qbinom() it is:


set.seed(123)
qqplot(rbinom(n=100, size=100, p=0.05),
rbinom(n=100, size=100, p=0.05) )

qqline(rbinom(n=100,size=100,p=.05),
distribution = function(probs) { qbinom(probs, size=100, prob=0.05) },
col = "red",
lwd = 0.5)




B.

Ashim Kapoor

unread,
Apr 18, 2017, 12:47:19 AM4/18/17
to Boris Steipe, r-h...@r-project.org
Dear Boris,

Thank you for your reply.

> dput(count1_vector)
c(5, 6, 4, 4, 6, 5, 4, 5, 3, 7, 5, 5, 3, 4, 8, 6, 10, 2, 4, 6,
8, 4, 4, 6, 8, 5, 6, 3, 7, 9, 4, 7, 5, 7, 3, 4, 5, 2, 11, 7,
8, 5, 5, 6, 3, 2, 3, 5, 9, 6, 5, 6, 7, 3, 10, 7, 6, 4, 9, 5,
7, 3, 7, 3, 2, 3, 4, 5, 10, 4, 5, 5, 6, 7, 4, 8, 7, 5, 5, 4,
8, 7, 9, 4, 4, 4, 7, 5, 4, 10, 4, 5, 6, 1, 3, 5, 4, 7, 4, 6)

set.seed(123)
qqplot(count1_vector,rbinom(n=100,size=100,p=.05))
qqline(count1_vector,distribution = function(probs) { qbinom(probs,
size=100, prob=0.05) },
col = "red",
lwd = 0.5)

When I do this, the line does not pass through the center of my data.I do
expect count1_vector to be 100 samples of binomial with n=100 and p=.05.

Any comments or suggestions for me ?

Note : I built a 95% Confidence interval for my data and I counted how
often out of 100 times did the data fall outside the CI.This I expect to be
binomial with n=100,p=.05. I repeated this a 100 times and obtained
count1_vector.

Best Regards,
Ashim.


On Mon, Apr 17, 2017 at 7:51 PM, Boris Steipe <boris....@utoronto.ca>
wrote:

Boris Steipe

unread,
Apr 18, 2017, 10:27:01 AM4/18/17
to Ashim Kapoor, R-help
As per the help pages, the data samples are expected in the second argument, "y".

So try
qqplot(rbinom(n=100, size=100, p=0.05), count1_vector)

... and then plot your qqline()

Alternatively, try

qqline(count1_vector,
distribution = function(probs) { qbinom(probs, size=100, prob=0.05) },
datax = TRUE, # <- logical. Should data values be on the x-axis?
col = "red",
lwd = 0.5)
... and use your original qqplot()


B.

Ashim Kapoor

unread,
Apr 19, 2017, 3:03:16 AM4/19/17
to Boris Steipe, R-help
Dear Boris,

Many thanks,
Ashim

On Tue, Apr 18, 2017 at 7:56 PM, Boris Steipe <boris....@utoronto.ca>
Reply all
Reply to author
Forward
0 new messages