faceting distr in terms of percents (nominal values)

45 views
Skip to first unread message

adam.l...@pnc.com

unread,
May 17, 2013, 2:42:12 PM5/17/13
to ggp...@googlegroups.com
Hi,

A couple of weeks ago Dennis answered my question as to how to bin (and facet) an interval variable so that its distribution is in terms of the percent of total.

It involved multiplying the density by the binwidth. However, this solution  not surprisingly does not work when the x axis is a nominal variable and binning is not required.

Does anyone know a solution to this problem?

Below is an example in which I would like the y axis to be the % of total instead of the frequency. Thanks for your help.

ggplot(diamonds,aes(cut))+geom_histogram()+facet_grid(clarity~color)


Adam Loveland


The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US Law. The postal address for PNC is 249 Fifth Avenue, Pittsburgh, PA 15222. If you do not wish to receive any additional advertising or promotional messages from PNC at this e-mail address, click here to unsubscribe. https://pnc.p.delivery.net/m/u/pnc/uni/p.asp
By unsubscribing to this message, you will be unsubscribed from all advertising or promotional messages from PNC. Removing your e-mail address from this mailing list will not affect your subscription to alerts, e-newsletters or account servicing e-mails.


Dennis Murphy

unread,
May 18, 2013, 12:55:05 AM5/18/13
to adam.l...@pnc.com, ggplot2
Hi Adam:

It appears that y = ..count../sum(..count..) works in this case. Compare

ggplot(diamonds, aes(x = cut, y = ..count..)) +
geom_histogram() +
facet_grid(clarity ~ color)

ggplot(diamonds, aes(x = cut, y = ..count../sum(count))) +
geom_histogram() +
facet_grid(clarity ~ color)

These look the same to me except for the y-axis scaling. This would
suggest that, when faceting is involved, the current behavior is
** y = ..count../sum(count) works for bar charts (i.e., when the
x-variable is a factor)
** y = k * ..density.. works for histograms (where the bins are of
fixed width k)

The above lines of code work the same when geom_histogram() is
replaced by geom_bar(). I would suggest raising an issue about this
because the 'rules' shouldn't depend on whether or not one is
faceting.

Technically, if you have a binwidth of 1 in a histogram, where x is
numeric, then a density histogram should be identical to a relative
frequency histogram. In ggplot2,

..density.. = ..count../(N * binwidth),

where N is the sample size ( = sum(..count..) ), and by default,
binwidth = (x_max - x_min)/30. Therefore, when you set the binwidth in
geom_histogram(), multiplying both sides of the above equation by
binwidth will give you the relative frequency. Moreover, when the
binwidth is 1, a density histogram is a relative frequency histogram.

This is easy to see by example:

g <- qplot(rnorm(100), geom = "histogram", binwidth = 1.5)
g2 <- ggplot_build(g)
g2$data

Pay attention to the count and density columns, and note that the
binwidths are 1.5.

In the case where x is a factor, the binwidth trick doesn't work as
pointed out in your example, but the relative frequency definition
does. Why this works for factor but not numeric x in the presence of
faceting I don't know; perhaps you should raise this as an issue to
the developers:

https://github.com/hadley/ggplot2/issues



Dennis
> --
> --
> You received this message because you are subscribed to the ggplot2 mailing
> list.
> Please provide a reproducible example:
> https://github.com/hadley/devtools/wiki/Reproducibility
>
> To post: email ggp...@googlegroups.com
> To unsubscribe: email ggplot2+u...@googlegroups.com
> More options: http://groups.google.com/group/ggplot2
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ggplot2" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ggplot2+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

adam.l...@pnc.com

unread,
May 19, 2013, 10:41:42 PM5/19/13
to Dennis Murphy, ggplot2
Thanks Dennis very much for the clarification! I don't feel bad asking the question, given the inconsistency you've pointed out.

I will raise the issue with the developers.





Adam Loveland




From: Dennis Murphy <djm...@gmail.com>
To: adam.l...@pnc.com
Cc: ggplot2 <ggp...@googlegroups.com>
Date: 05/18/2013 12:55 AM
Subject: Re: faceting distr in terms of percents (nominal values)
Sent by: ggp...@googlegroups.com


Reply all
Reply to author
Forward
0 new messages