geom_dotplot + stat_summary: Have mean points centered on top of dot stacks?

271 Aufrufe
Direkt zur ersten ungelesenen Nachricht

stefan....@gmail.com

ungelesen,
19.03.2015, 06:12:4419.03.15
an ggp...@googlegroups.com
Dear all,

As indicated in the subject line, I would like to have a point geom represent means. The mean points should be centred on the corresponding bins (dot stacks) and positioned on top of the dot stacks:

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_dotplot(binaxis = "y") + coord_flip() + stat_summary(fun.y=median, geom = "point", shape = 6, size = 4)

The above gives me mean points, but not positioned correctly (uncentered on bins and at their bottom).

Any help would be appreciated!


Stefan


P.S.: Apologies for cross-posting this here and on http://stackoverflow.com/questions/29097755/geom-dotplot-mark-means-in-dotplots-visually-using-arrows-or-the-like





Allen Bingham & Diana Rigg (gmail)

ungelesen,
20.03.2015, 18:17:2820.03.15
an stefan....@gmail.com, ggp...@googlegroups.com

Stefan,

 

The closest I could get to what (I think) you described that you want, is as follows:

 

ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +

geom_dotplot(binaxis = "y") +

coord_flip() +

stat_summary(aes(x=factor(cyl +.5)), fun.y=mean, geom = "point", shape = 6, size = 4) +

scale_x_discrete(breaks=seq(4,8,2))

 

Note three things … (1) I replaced your “fun.y=median” with “fun.y=mean” since you stated you wanted the mean ... not the median; and (2) it doesn’t matter what value you add to “cyl” in the 4th line above … the “shape=6” symbols get plotted half-way between the factored “cyl” values in the original data; and (3) the “scale_x_discrete” needs to added to keep the axis from labelling the “6.5” corresponding ‘factor’.

 

Not sure if this actually gets you what you want.

 

Maybe someone else here can suggest something that works more elegantly than my solution.

 

Hope this helps-Allen

______________________________________

Allen Bingham

Bingham Statistical Consulting

aebin...@gmail.com

LinkedIn Profile: www.linkedin.com/pub/allen-bingham/3b/556/325

--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

stefan....@gmail.com

ungelesen,
26.03.2015, 05:04:0226.03.15
an ggp...@googlegroups.com, stefan....@gmail.com
Allen,

Thanks for your solution sketch (and for pointing out my copy & paste errors)!

Your sketch is close to my intention, but I want the mean values to get binned themselves, ie centred on their corresponding bins.

In the meantime, I came up with the following:

library(ggplot2)

p
<- ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_dotplot(binaxis = "y", binwidth=1)
g
<- ggplot_build(p)
mtcars
[order(mtcars$cyl),]$bcenter <- g$data[[1]]$y
m$y
<- aggregate(mpg ~ cyl, mtcars, function(x) {
    a
<- mean(x);
    mtcars$bcenter
[which.min(abs(mtcars$bcenter-a))]
})[,2]
p
+ geom_point(aes (x = factor(cyl + .5), y = y), data = m, color = "red")


In essence, I figured that I can access the plot geometry using ggplot_build to get the computed bin centers (g$data[[1]]$y) and then find the closest matching bin centers for the mean values.

This gives me mean points centred on bins, however, there is a major flaw: Using ggplot_build, I can only access bin centers of actually filled (non-empty) bins. If the mean value would fall into an empty bin range, the above would result in a misleading visualisation :/

Any ideas how to get centers of *all* (empty and non-empty) bins?

Cheers,
Stefan

stefan....@gmail.com

ungelesen,
26.03.2015, 05:09:1226.03.15
an ggp...@googlegroups.com, stefan....@gmail.com
Allen,

I posted an incomplete code sample above. Here is the complete one:

library(ggplot2)

p
<- ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_dotplot(binaxis = "y", binwidth=1)
g
<- ggplot_build(p)
mtcars
[order(mtcars$cyl),]$bcenter <- g$data[[1]]
$y

m
<- aggregate(mpg ~ cyl, mtcars, mean)

m$y
<- aggregate(mpg ~ cyl, mtcars, function(x) {
    a
<- mean(x);
    mtcars$bcenter
[which.min(abs(mtcars$bcenter-a))]
})[,2]


p
+ geom_point(aes (x = factor(cyl + .5), y = y), data = m, color = "red") +
    scale_x_discrete
(breaks=seq(4,8,2))

Stefan

Allen Bingham & Diana Rigg (gmail)

ungelesen,
26.03.2015, 13:05:0826.03.15
an stefan....@gmail.com, ggp...@googlegroups.com

Stefan,

 

Likely can’t help you anymore … you’ve gone beyond what I know about ggplot2, etc.

 

… that said the code you have below did NOT work for me, I get the following error:

 

Error in data.frame(x = 1:3, y = list(`1` = NULL, `12` = NULL, `19` = NULL),  :

  arguments imply differing number of rows: 3, 0

 

Sorry can’t help more-Allen

--

stefan....@gmail.com

ungelesen,
26.03.2015, 18:12:0926.03.15
an ggp...@googlegroups.com, stefan....@gmail.com
Allen,

My apologies, here comes the checked listing:

library(ggplot2)

p
<- ggplot(mtcars, aes(x = factor(cyl), y = mpg)) + geom_dotplot(binaxis = "y", binwidth=1)
g
<- ggplot_build(p)


mtcars$bcenter
<- NA
mtcars
[order(mtcars$cyl),]$bcenter <- g$data[[1]]$y

m
<- aggregate(mpg ~ cyl, mtcars, mean)


m$y
<- aggregate(mpg ~ cyl, mtcars, function(x) {
    a
<- mean(x);
    mtcars$bcenter
[which.min(abs(mtcars$bcenter-a))]
})[,2]


p
+ geom_point(aes (x = factor(cyl + .5), y = y), data = m, color = "red") +
    scale_x_discrete
(breaks=seq(4,8,2))



Stefan
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten