How do I get ggplot to stop hijacking the order of my data?

20,531 views
Skip to first unread message

Steven Vannoy

unread,
Feb 11, 2015, 8:01:43 AM2/11/15
to ggp...@googlegroups.com
I am frequently running into this issue and usually after trying multiple variations I give up and just live with the order that ggplot forces onto my plots, but I'm hoping that I can learn how to override this.

Here is some example code to illustrate my problem

library(dplyr)
library(ggplot2)

# Create a demonstration data frame
demoDat <- data.frame(id=seq(1, 6, 1), label=c("john", "bob", "mark", "sue", "jane", "mary"),
                      measure=c(27, 32, 28, 30, 29, 33), group=factor(rep(c("married", "single"),3)))

# Just confirm that I created what I meant it to
demoDat

# Create a straight forward plot of my data
ggplot(data=demoDat, aes(y=measure, x=id, group, color=group)) +
  geom_point(size=4)
# That worked just fine

# Now I'd like the x-axis to use my labels, and while I don't need to set breaks I do in anticipation of the next desired change
ggplot(data=demoDat, aes(y=measure, x=id, color=group)) +
  geom_point(size=4)+
  scale_x_continuous(breaks=demoDat$id, labels=demoDat$label)
# That worked too

# I'd like to display my data in descending order, not ascending (or alphabetical) so I arrange it that way
demoDat <- arrange(demoDat, desc(id))

# But when I plot it, ggplot orders my x-axis in ascending order
ggplot(data=demoDat, aes(y=measure, x=id, color=group)) +
  geom_point(size=4)+
  scale_x_continuous(breaks=demoDat$id, labels=demoDat$label) 

# What I really want is my x-axis ordered by group so I order my data that way
demoDat <- arrange(demoDat, group)

# And again ggplot ignores the order of my data and just plots it in ascending "id"
ggplot(data=demoDat, aes(y=measure, x=id, color=group)) +
  geom_point(size=4)+
  scale_x_continuous(breaks=arrange(demoDat, group)$id, labels=arrange(demoDat, group)$label)

I am aware of the trans="reverse" variable I could send to scale_x_continuous, but that doesn't really solve my problem because I want control over the order of my data.

I've also tried doing an "embedded" arrange within the aes() argument to ggplot to sort the id there, but then the labels don't match; so then I'm sorting all over the place with in the ggplot chain of commands and just begging for errors.

It seems to me in my naiveté that I ought to be able to tell ggplot to just plot the data in the order that I provide it, but i haven't figured out how to do that. 

Can anyone explain to me if this is possible and how one goes about it? Or, if there are great reasons why it should not work that way? I'm open to anything that will either help me do what I want to do or convince me that I shouldn't be doing it.

Thanks

Chris Neff

unread,
Feb 11, 2015, 8:39:00 AM2/11/15
to Steven Vannoy, ggp...@googlegroups.com
Use factors instead of character vectors.  So if you make an id vector like:

id=factor(c("Second", "First", "Third"), levels=c("First", "Second", "Third"))

Then ggplot will always plot it in the order the levels were specified (First, Second, Third)



--
--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

---
You received this message because you are subscribed to the Google Groups "ggplot2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ggplot2+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Bond-Lamberty

unread,
Feb 11, 2015, 9:30:22 AM2/11/15
to ggplot2
To add on to Chris's response: by design, ggplot should *not* care
about the order of the rows in your data frame; if it ever does,
that's a bug. Use factor levels to force an ordering.

Ben

Willem

unread,
Feb 11, 2015, 10:11:07 AM2/11/15
to ggp...@googlegroups.com
As mentioned you should use factors if you want to impose an ordering.
In you case you could do something like this:

idxOrder <- unlist(lapply(levels(demoDat$group), function(x) which(demoDat$group == x))) # This gets the indices in the correct order you want
demoDat$id2 <- factor(demoDat$id, levels = idxOrder) # Sets the order of the factor

ggplot(data=demoDat, aes(y=measure, x=id2, color=group)) +
    geom_point(size=4)+
    scale_x_discrete(breaks=arrange(demoDat, group)$id, labels=arrange(demoDat, group)$label)

And plot, using a discrete scale.

Kind regards,

Willem

Op woensdag 11 februari 2015 14:01:43 UTC+1 schreef Steven Vannoy:

Steven Vannoy

unread,
Feb 11, 2015, 10:36:10 AM2/11/15
to ggp...@googlegroups.com
Thank you all for your replies, they are helpful in furthering my understanding. I see why ggplot might not want to honor the order of rows in a data frame when it is doing the aggregating, such as a stat="bin", but when stat="identity", I've already aggregated and ordered my data in the way that makes sense and that's how I want it ordered, but of course, I don't always get what I want :)

I was just sitting here playing around with factors realizing I'm going to have to get better at manipulating them, so Willem's example is very helpful, and also demonstrates how clumsy it is to manipulate factors in a very robust way (i.e. not manually coding the order but to do it according to dynamic criteria automatically).

Chris Neff

unread,
Feb 11, 2015, 10:49:47 AM2/11/15
to Steven Vannoy, ggp...@googlegroups.com
factor(ids, levels=ordered.ids) isn't that hard.  If you have them already ordered in some data.frame DF you can just say

DF$id.ordered <- factor(DF$ids, levels=DF$ids)

Mark Lyman

unread,
Feb 11, 2015, 3:05:24 PM2/11/15
to Steven Vannoy, ggp...@googlegroups.com
See ?reorder for some dynamic ordering of levels.

Reply all
Reply to author
Forward
0 new messages