how to use geom_text in combination with geom_violin, when position="dodge"?

289 views
Skip to first unread message

Giovanni Marco Dall'Olio

unread,
Oct 22, 2012, 9:08:29 AM10/22/12
to ggplot2
Hello,
I have a violin plot where, for each x, there are different figures, coloured according to another variable and positioned using position="dodge".
This can be explained better with some code:

   # reduce the size of the diamonds dataset, for faster debugging
   diamonds.short = subset(diamonds, cut == "Premium" & x > 7)
   # plot violin plots
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color))


On this plot, I want to add some text labels, showing the price of all the diamonds that have a value of "x" higher than 9.
This is how it should look like:
- http://bioevo.upf.edu/~gdallolio/images/ggplot_question_wanted.png
However, this is what I have obtained so far:
- http://bioevo.upf.edu/~gdallolio/images/ggplot_question_obtained.png

Complete code:

   library(ggplot2)
   diamonds.short = subset(diamonds, cut == "Premium" & x > 7)
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat))

Each of the text labels should be showed inside the violin plot where they belong. For example, if the diamond that has (price = 2.71) belongs to "E" color, its label should be shown inside the E figure.

I suspect that the solution lays in giving a correct value of "position" to geom_text, but so far I could not find the correct way.
This is what I have tried so far, but none of these work:
   library(ggplot2)
   diamonds.short = subset(diamonds, cut == "Premium" & x > 7)
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat, color=color), position="dodge")
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat, color=color, position=color))
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat, color=color), position=position_dodge(width=0.9))
   ggplot(data=diamonds.short, aes(x=cut, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat, color=color), position=position_identity(width=0.9))


The best I was able to do so is to paste cut and color into a new column, and use that for scale_x:
    diamonds.short$cut_color = paste(diamonds.short$cut, diamonds.short$color, sep='_')
    ggplot(data=diamonds.short, aes(x=cut_color, y=price)) + geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9), aes(label=carat, color=color))

The problem is that now I don't know how to remove the extra labels on the X axis. Plus, I think that there is should be some better way to do this, using position.

What is the best ggplot2-ish way to do this? How can I plot the labels on the correct violin?
Thanks,
Gio
--
Giovanni Dall'Olio, phd student
IBE, Institut de Biologia Evolutiva, CEXS-UPF (Barcelona, Spain)

My blog on bioinformatics: http://bioinfoblog.it

Brian Diggs

unread,
Oct 22, 2012, 4:17:22 PM10/22/12
to dalloliogm-Re5JQ...@public.gmane.org, ggplot2
The problem you are running into might be a bug, but I can explain what
is happening. First, this is what should work:

ggplot(data=diamonds.short, aes(x=cut, y=price, color=color)) +
geom_violin() +
geom_text(data=diamonds.short[diamonds.short[["x"]] > 9,],
aes(label=carat),
position=position_dodge(width=0.9))

It doesn't, and the reason it doesn't is that there is no color=="D"
that also has x>9. So that combination is empty in your subset and the
labels then get spread across 6 different groups/positions rather than
7. You can see this by changing the threshold such that there is a D
which meets criteria; the labels align correctly:

ggplot(data=diamonds.short, aes(x=cut, y=price, color=color)) +
geom_violin() +
geom_text(data=diamonds.short[diamonds.short[["x"]] > 8.98,],
aes(label=carat),
position=position_dodge(width=0.9))

A workaround is to create a separation annotation dataset which has
dummy, missing entries for all the levels:

bigs <- diamonds.short[diamonds.short[["x"]] > 9,]
bigs <- rbind.fill(bigs, data.frame(color = levels(bigs$color),
cut="Premium"))

I have to include the color, since that is what I need one of, and cut
because that is the x variable. Now I can use this dataset in the
geom_text call and it aligns correctly:

ggplot(data=diamonds.short, aes(x=cut, y=price, color=color)) +
geom_violin() +
geom_text(data=bigs, aes(label=carat),
position=position_dodge(width=0.9))

> The best I was able to do so is to paste cut and color into a new column,
> and use that for scale_x:
> diamonds.short$cut_color = paste(diamonds.short$cut,
> diamonds.short$color, sep='_')
> ggplot(data=diamonds.short, aes(x=cut_color, y=price)) +
> geom_violin(aes(color=color)) + geom_text(data=subset(diamonds.short, x>9),
> aes(label=carat, color=color))
>
> The problem is that now I don't know how to remove the extra labels on the
> X axis.

You could take this approach by specifying the breaks and associated
levels.

+ scale_x_discrete(breaks="Premium_G", labels="Premium")

> Plus, I think that there is should be some better way to do this,
> using position.
>
> What is the best ggplot2-ish way to do this? How can I plot the labels on
> the correct violin?
> Thanks,
> Gio

--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
Reply all
Reply to author
Forward
0 new messages