ggplot2: boxplot, with observation scatter plot.

1,788 views
Skip to first unread message

Akiyuki Suzuki

unread,
May 25, 2012, 9:42:25 AM5/25/12
to ggplot2
Dear ggplot2 users,
 
I am trying to preparing boxplot, shown below,  with observation scatter plot.
Does anyone know what kind of command I should  use to add scatter plot?
 
ggplot(aes(y = boxthis, x = f2, fill = f1), data = df) + geom_boxplot() 

enter image description here

 

Thank you in advance.

Best regards,

Akiyuki Suzuki

R. Michael Weylandt

unread,
May 25, 2012, 11:10:15 PM5/25/12
to Akiyuki Suzuki, ggplot2
I think you want to add the points "geom" (appropriately given by  geom_point()) but with categorial variables that's sometimes prone to overplotting so I think the standard recommendation is geom_jitter() 

With a built in data set something like

 ggplot(aes(y = price, x = cut, fill = color), data = diamonds) + geom_boxplot() + geom_jitter(alpha = I(1/100))

I added alpha because there are so many points for this data set it's necessary: it's a judgement call on yours. 

Hope this helps, 
Michael

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Dennis Murphy

unread,
May 26, 2012, 2:23:29 AM5/26/12
to Akiyuki Suzuki, ggplot2
Hi:

Michael's advice is sound when you have a large data frame. If you have a relatively small one, an alternative to jittering is geom_dotplot(), new to version 0.9.0. Here's an example, taken from the 0.9.0 transition guide:

ggplot(mtcars, aes(x = factor(vs), y = mpg)) +
   geom_boxplot(aes(fill = factor(vs)), alpha = 0.3,
               outlier.color = NA) +
   geom_dotplot(binaxis = "y", stackdir = "center",
              position = "dodge", colour = "blue", fill = "blue") +
   labs(x = "vs", y = "mpg", fill = "vs")

If your boxplot contains outside values that are individually plotted, it's not a particularly good idea to add jittering alone because the outside values will be duplicated and jittered, making them appear to be twice as numerous as they really are. One workaround is to 'blank out' the outside values with 'outlier.color = NA', which is done above, recognizing that any value outside the extent of the whiskers in either direction is an outside value.

geom_dotplot() will also work with large data, but you will probably need to be careful about the widths of the stacked points as well as their size. Start with the defaults, as they are true to the intentions of the developer of the method; if they need to be modified, then consult the help page, which has a number of good examples.

HTH,
Dennis


--

Akiyuki Suzuki

unread,
May 26, 2012, 8:54:54 AM5/26/12
to Dennis Murphy, ggplot2
Dear Dennis and Michael,
I really appreciate your advice.
I am glad to join this group.
 
Dear Dennis,
Unfortunately our office control pacage version.
If  version 0.9.0 is available, I will try.
I am looking forward to using geom_dotplot().
 
Dear Michael and all
 
We usually use
I tried to run below script.

set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 100), ]
dia.DE <- dsmall[(dsmall$color=="D"|dsmall$color=="E")
                 & (dsmall$cut=="Ideal"|dsmall$cut=="Premium"|dsmall$cut=="Very Good"),]
ggplot(aes(y = price, x = cut, fill = color), data = dia.DE) +   geom_boxplot() + geom_jitter()
 
The result is attached.
Each category has two boxes.
The points of observation should be on each box with jitter.
However the points of observation is not on appropriate box, devided two.
 
If you do not mind, could you please give me advise?
 
Thank you in advance.
 
Best reards,
Akiyuki Suzuki
 
 
 
 
2012/5/26 Dennis Murphy <djm...@gmail.com>
Rplot02.pdf

R. Michael Weylandt

unread,
May 26, 2012, 5:45:45 PM5/26/12
to Akiyuki Suzuki, Dennis Murphy, ggplot2
Hi, 

If I understand your objection, it's more clearly reflected in this: 

ggplot(aes(y = price, x = cut, fill = color), data = dia.DE) +   geom_boxplot() + geom_jitter(aes(color = color))

which shows that some of the blue dots wind up over the red boxplot and vice-versa [See attached graph1.pdf]

To get rid of this, you can reduce the width of the jittering to zero like this: 

ggplot(aes(y = price, x = cut, fill = color), data = dia.DE) +   geom_boxplot() + geom_jitter(aes(color = color), position = position_jitter(width = 0))

[Producing graph2.pdf]

It seems that by default, ggplot2 does not seem to want to line up the points with the center of their boxplot (if it did, then you could just do a slight jitter of width no more than the boxplot and there wouldn't be any "traitors") but I don't know how to make it do so. Hopefully a ggmaster can chime in about how to do so, it does seem like a reasonable thing to ask. 

Hope this helps, 
Michael
graph2.pdf
graph1.pdf

R. Michael Weylandt

unread,
May 26, 2012, 5:47:09 PM5/26/12
to Akiyuki Suzuki, ggplot2
Oh -- and note that I think mapping the dots onto the same colors sa the boxplot fills is a really bad idea because you loose sight of many of them -- it was just a device to illustrate what I think was troubling you. Definitely not recommended for production graphics. 

Best, 
Michael

Dennis Murphy

unread,
May 26, 2012, 11:24:09 PM5/26/12
to Akiyuki Suzuki, R. Michael Weylandt, ggplot2
Hi:

This type of graphic, which has come up in this group several times in the past, illustrates one of the limitations of the grammar of graphics. In this example, boxplots are dodged by color within cut by default [on the geom_boxplot() help page, it is documented that mapped aesthetics, when factors, are 'automatically dodged' (see the examples section). The desire is to dodge the points in the same way *and* to jitter them. The problem is that both dodging and jitter are position adjustments, and only one is allowed - in other words, it's a nontrivial balancing act to get the two working in concert and it depends on the configuration of the graph. What works in one example may not map to another.

In a fairly simple case, you can get away with using position_jitter(); here's an example from the 0.9.0 transition guide that should work in 0.8.9 as well:


ggplot(mtcars, aes(x = factor(vs), y = mpg)) +
geom_boxplot(aes(fill = factor(vs)), alpha = 0.3,
                     outlier.colour = NA) +
geom_point(position = position_jitter(width = 0.05),

                  colour = "blue", fill = "blue") +
labs(x = "vs", y = "mpg", fill = "vs")

However, if you translate this to the present example, it doesn't work because the points are jittered relative to the levels of cut, not to the levels of color nested within cut. The 'automatic' dodging of the levels of color within cut that takes place in geom_boxplot() does not transfer to geom_point(). A (single) position adjustment of points is allowed; one that works (with a little tweaking of the width) is the following:


set.seed(1410)
dsmall <- diamonds[sample(nrow(diamonds), 100), ]
dia.DE <- dsmall[(dsmall$color=="D"|dsmall$color=="E")
                 & (dsmall$cut=="Ideal"|dsmall$cut=="Premium"|dsmall$cut=="Very Good"),]


ggplot(dia.DE, aes(y = price, x = cut, fill = color)) +
    geom_point(aes(colour = color), position = position_dodge(width = 0.75)) +
    geom_boxplot(alpha = 0.2, outlier.colour = NA)

If you replace geom_point() with geom_jitter(), the same plot should obtain, as geom_jitter() will ignore the position = position_dodge() argument. You can dodge or jitter, but you can't dodge _and_ jitter.

In 0.9.0, geom_dotplot() has a dodge = argument that allows one to specify the dodging variable; see the examples in section 3.3 of the transition guide for some illustrations as well as the geom_dotplot() help page:
http://cloud.github.com/downloads/hadley/ggplot2/guide-col.pdf

One approach to this problem using geom_dotplot() is the following:

# 0.9.0+ only:
ggplot(dia.DE, aes(y = price, x = cut, fill = color)) +
     geom_boxplot(alpha = 0.2, outlier.colour = NA) +
     geom_dotplot(aes(colour = color), binaxis = 'y',
                  stackdir = 'centerwhole', position = 'dodge')

Unfortunately, the center of the point stacks doesn't align perfectly with the whiskers of the box plots in this example. It's possible that default dodging width differs between geom_boxplot() and geom_dotplot(). Winston or someone else might be able to provide a better selection of arguments that gets the alignment right - I haven't been able to figure it out yet despite several guesses that didn't work.

If some enterprising soul is looking for a geom to create, this would be a good candidate :)

Dennis
Reply all
Reply to author
Forward
0 new messages