Plotting within-group proportions of a contingency table

178 views
Skip to first unread message

evan

unread,
Sep 27, 2009, 11:29:58 AM9/27/09
to ggplot2
Hello, all:

I'm trying to visualize a 2-way contingency table by plotting multiple
lines (line colour = factor levels of variable Y) that show the
proportion of observations (y-axis) at each level of variable X.

Someone in the archives had a similar issue (http://
tolstoy.newcastle.edu.au/R/e5/help/08/10/5907.html), but it didn't
quite solve my problem. Here's an example of what I'm trying to do,
using the diamonds dataset:

qplot(cut, ..count../ sum(..count..), colour=color, data=diamonds,
stat="bin", geom="line", group=color)

However, this plot shows the proportion of observations relative to
the *total* observations [ y = ..count.. / sum(..count..) ]. I want to
plot the proportion of observations to the total obs. *within each
group* -- i.e., the margins of the contingency table rather than the
grand total. I assume this must happen through the 'y' parameter, but
I just can't think of how to do it.

Thanks,
Evan

p.s. - I realize there may be better ways to visualize contingency
tables, like ggfluctuation. I need a line plot, though.

hadley wickham

unread,
Sep 27, 2009, 2:05:44 PM9/27/09
to evan, ggplot2
Try this:

qplot(cut, ..density.., colour=color, data=diamonds, geom="freqpoly",
group=color)

Hadley
--
http://had.co.nz/

evan

unread,
Sep 27, 2009, 4:10:35 PM9/27/09
to ggplot2
Hadley, thanks. geom_freqpoly is more direct... However, this still
seems to use the wrong denominator. What I'm looking for is: among
Fair diamonds, what % are D/E/F/etc.; then among Good diamonds, what %
are D/E/F/etc. In other words, I'm trying to plot this table:

prop.table(table(diamonds$cut, diamonds$color), margin=2)

Evan

hadley wickham

unread,
Sep 27, 2009, 4:25:13 PM9/27/09
to evan, ggplot2
> Hadley, thanks. geom_freqpoly is more direct... However, this still
> seems to use the wrong denominator. What I'm looking for is: among
> Fair diamonds, what % are D/E/F/etc.; then among Good diamonds, what %
> are D/E/F/etc. In other words, I'm trying to plot this table:

I think you mean
prop.table(table(diamonds$cut, diamonds$color), margin=1)?

df <- as.data.frame(prop.table(table(diamonds[c("cut", "color")]), margin=1))
qplot(cut, Freq, data = df, geom = "line", colour = color, group = color)

Hadley
--
http://had.co.nz/

evan

unread,
Sep 27, 2009, 6:00:48 PM9/27/09
to ggplot2
Yes, I did mean margin=1. That's a nice solution, I hadn't thought of
using Freq. I'll assume there's no easier way to do that with just
ggplot2, without resorting to base R functions.

Thanks for all your great work.

Evan

hadley wickham

unread,
Sep 27, 2009, 6:44:05 PM9/27/09
to evan, ggplot2
> Yes, I did mean margin=1. That's a nice solution, I hadn't thought of
> using Freq. I'll assume there's no easier way to do that with just
> ggplot2, without resorting to base R functions.

Generally, I think it's better to be explicit about data
manipulations, and there's no reason you should and do everything
within ggplot2.

Hadley


--
http://had.co.nz/

Reply all
Reply to author
Forward
0 new messages