Listing and description of all formatter options for scale_x_discrete, etc.

365 views
Skip to first unread message

Jason Rupert

unread,
Sep 29, 2009, 4:55:34 PM9/29/09
to ggp...@googlegroups.com, jasonk...@yahoo.com
I searched http://had.co.nz/ggplot2/ and http://had.co.nz/ggplot2/scale_discrete.html for a listing and description of all the options for formatter, but evidently I need to refine my search skills.

By any chance can someone provide a like to such a list and desciption or provide such a list and desciption?

Thanks again for all the help and any feedback.

Jason



Jason Rupert

unread,
Sep 30, 2009, 7:52:18 AM9/30/09
to ggp...@googlegroups.com, jasonk...@yahoo.com
I have a bunch of dates (e.g., 1979, 1980, etc.) on the scale_x_discrete x-axis of my ggplot2 produced plot. Right now there are too many dates shown on the x-axis and they all overlap to produce and ugly jumbled mess. I am afraid that even after abbreviating (using the formatter option) the ugly jumbled mess will remain. Are there typical ggplot2 approaches for working around such an issue? For example are there ways to reduce the font size or only list every other year on the scale_x_discrete x-axis?

Thanks again for any insights.

Jason




Learning R Blog

unread,
Sep 30, 2009, 9:08:54 AM9/30/09
to ggplot2
ggplot2 usually deals pretty well with dates, so why do you use
discrete scales if your values are dates?

To answer your question: you would need to calculate and set the
breaks / labels manually.

Consider the following example, and compare plots p and p1. In the
case of p1, the breaks/labels are set at every third year:
library(ggplot2)

DF <- data.frame(year = as.character(1980:2000), value = 1980:2000)
DF$id <- seq_along(DF$year)

labels <- DF$year[DF$id %% 3 == 1]

p <- ggplot(DF, aes(year, value)) + geom_point()

p1 <- p + scale_x_discrete(breaks = labels, labels = labels)


--
http://learnr.wordpress.com

baptiste auguie

unread,
Sep 30, 2009, 9:10:36 AM9/30/09
to Jason Rupert, ggp...@googlegroups.com
Hi,

Having a smaller size for every other label seems a lot trickier than
I first hoped. Here's a kludge that seems to work, yet makes little
sense to me,


even.tiny <- function(x) {

format <- function(ind){

if(ind%%2){
bquote(~.(x[ind]))
## why ~ is needed is beyond me!
} else {
bquote(~scriptstyle(.(x[ind])))
}
}

e <- lapply(seq_along(x), format)
as.expression(e)

}

qplot(1:10, 1:10) + scale_x_continuous(formatter=even.tiny)

Now this cannot be called a solution, but I spent enough time fiddling
with the expression that I want to post it anyway.


Best,

baptiste

James Howison

unread,
Sep 30, 2009, 9:48:18 AM9/30/09
to ggplot2

On Sep 30, 2009, at 09:08, Learning R Blog wrote:

>
> ggplot2 usually deals pretty well with dates, so why do you use
> discrete scales if your values are dates?

Jason, Have you tried telling R that your date column is, in fact, a
date?

e.g.:
df$date <- as.Date(df$date, format = "%d-%b-%y")
as.POSIXct works too.

That will allow ggplot2 to do more sensible things with the labels
etc, I think.

--J

Jason Rupert

unread,
Sep 30, 2009, 10:03:27 AM9/30/09
to Learning R Blog, ggplot2
I tried to form an example of the type of data I am working with in the code snippet below, but unfortunately it is crashing with an unknown errror:
> sample_size<-20000
>
> Home_SqFootage<-sample(1200:3600, size=sample_size, rep=T)
> Home_Year_Built<-sample(1989:2008, size=sample_size, rep=T)
> Home_Year_Sold<-sample(1989:2008, size=sample_size, rep=T)
>
> Home_DF<-data.frame(SqFootage=Home_SqFootage, YearBuilt=Home_Year_Built, YearSold=Home_Year_Sold)
>
> qplot(YearBuilt, data = Home_DF, binwidth = 1, fill = factor(Home_DF$SqFootage)) +
+                 scale_fill_discrete("Sq Footage") +
+                 scale_y_continuous("Counts") +
+                 scale_x_discrete("Year Built")
Error in unit(at, "native") : 'x' and 'units' must have length > 0
 
Running the actual data does not crash.  The dates being smooshed on top of each other may be related to the fact that I am using histograms to display the data instead of a scatter plot. 
 
Thank you again for all your help.
 
Jason

Luciano Selzer

unread,
Sep 30, 2009, 10:17:11 AM9/30/09
to Jason Rupert, Learning R Blog, ggplot2
I think you're getting that error because you are specifiying scale_x_discrete where ggplot interprets it as a continuous axis.
Luciano


2009/9/30 Jason Rupert <jasonk...@gmail.com>

hadley wickham

unread,
Sep 30, 2009, 10:46:48 AM9/30/09
to Jason Rupert, ggp...@googlegroups.com
Hi Jason,

On Tue, Sep 29, 2009 at 3:55 PM, Jason Rupert <jasonk...@yahoo.com> wrote:
>
> I searched http://had.co.nz/ggplot2/ and http://had.co.nz/ggplot2/scale_discrete.html for a listing and description of all the options for formatter, but evidently I need to refine my search skills.
>
> By any chance can someone provide a like to such a list and desciption or provide such a list and desciption?

The formatters are: comma, dollar, percent, scientific and precision.

Hadley

--
http://had.co.nz/

Jason Rupert

unread,
Sep 30, 2009, 10:49:30 AM9/30/09
to Luciano Selzer, Learning R Blog, ggplot2
Okay.  Let me start over.  I had to use as.character() on the year information in order to place the year label directly below the column of data in when using ggplot2 to produce a histogram.
 
This worked well for small numbers of years of data, but now here is what I am seeing:
 
Home_SqFootage<-sample(1200:3600, size=sample_size, rep=T)
Home_Year_Built<-sample(1989:2008, size=sample_size, rep=T)
Home_Year_Sold<-sample(1989:2008, size=sample_size, rep=T)
Home_DF<-data.frame(SqFootage=Home_SqFootage, YearBuilt=as.character(Home_Year_Built), YearSold=as.character(Home_Year_Sold))
qplot(YearBuilt, data = Home_DF, binwidth = 1, fill = factor(Home_DF$SqFootage)) +
                scale_fill_discrete("Sq Footage") +
                scale_y_continuous("Counts") +
                scale_x_discrete("Year Built")
 
The as.character is used intentionally because it seemed like the only way to get the year label to show up directly below the data column on the histogram.
 
Now, is there a command within ggplot2 to reduce the number of years that appears and still have the year label appear directly below the data column? 

Luciano Selzer

unread,
Sep 30, 2009, 12:03:19 PM9/30/09
to Jason Rupert, Learning R Blog, ggplot2
Ok, I tried to reproduce your example as it is but with 20000 rows it takes to much time, even with 2000 (I have a Core 2 Duo 1.5 ghz and 2GB, anyone is having the same issue?). I did it with 200 rows and it plots correctly without any errors. Anyhow, another suggestion is to change the angle of the years in the plot so the won't overplot.

James Howison

unread,
Sep 30, 2009, 12:23:22 PM9/30/09
to ggplot2
I'm not entirely sure what you are going for, but I think this will
help:

sample_size<-100 # More is a problem for the SqFootage
Home_SqFootage<-sample(1200:3600, size=sample_size, rep=T)
Home_Year_Built<-sample(1989:2008, size=sample_size, rep=T)
Home_Year_Sold<-sample(1989:2008, size=sample_size, rep=T)

# Now convert to semantic types
Home_SqFootage <- factor(Home_SqFootage)
Home_Year_Sold <- as.Date(as.character(Home_Year_Sold), "%Y")
Home_Year_Built <- as.Date(as.character(Home_Year_Built), "%Y")
# Note that this creates Date types for these columns, close to the
middle of the year

# Add to data frame
Home_DF<-data.frame(SqFootage=Home_SqFootage,
YearBuilt=Home_Year_Built, YearSold=Home_Year_Sold)

# Now plot
qplot(YearBuilt, data = Home_DF, binwidth = 365, fill = SqFootage)
# Don't want to do fill for binwidth=1, seems to take forever, seems
that binwidth for Date scales is in Days.

Seems like you'd want to convert SqFootage into some ranges, e.g.
Large, Small

Home_DF$SqFootage_Label <- recode(Home_SqFootage, "1200:2400='Small';
2401:3600='Large'", as.factor.result=T,levels=c("Large","Small"))
qplot(YearBuilt, data = Home_DF, binwidth = 365, fill = SqFootage_Label)

As I say, not quite sure what you are trying to analyze with the
graph, if you let us know that we can help more.

--J

Jason Rupert

unread,
Sep 30, 2009, 12:31:39 PM9/30/09
to Luciano Selzer, Learning R Blog, ggplot2
I would be very interested in knowing how to adjust the angle of the year or if that is even possible when using the as.character, which seems to be mandatory in order to get the year marker to line up directly below the data column. 
 
Until then here is the work around (hack) I came up with to skip every other year:
 
sample_size<-200
Home_SqFootage<-sample(1200:3600, size=sample_size, rep=T)
Home_Year_Built<-sample(1989:2008, size=sample_size, rep=T)
Home_Year_Sold<-sample(1989:2008, size=sample_size, rep=T)
Home_DF<-data.frame(SqFootage=Home_SqFootage, YearBuilt=as.character(Home_Year_Built), YearSold=as.character(Home_Year_Sold))

label_length<-length(unique(Home_DF$YearBuilt))
all_labels<-sort(unique(Home_DF$YearBuilt))
new_label<-NULL
for(ii in 1:label_length)
{
        if(ii%%2 == 0)
        {
                new_label_tmp<-as.character(all_labels[ii])
        } else {
                new_label_tmp<-c("")
        }      
       
        new_label<-c(new_label, new_label_tmp)
}
qplot(YearBuilt, data = Home_DF, binwidth = 1, fill = factor(Home_DF$SqFootage)) +
                     scale_fill_discrete("Sq Footage") +
                     scale_y_continuous("Counts") +
                     scale_x_discrete("Year Built", labels=new_label)
 
 
There is probably a much more elegant way to do this, so any feedback on that front is greatly appreciated.  Note that I tried the abbreviated "formatted" value, but that did not appear to have any affect on the year displayed. 
 
Thank you again for all your great feedback and insights.

Luciano Selzer

unread,
Sep 30, 2009, 3:34:43 PM9/30/09
to Jason Rupert, Learning R Blog, ggplot2
With your example:

qplot(YearBuilt, data = Home_DF, binwidth = 1, fill = factor(Home_DF$SqFootage)) +
                scale_fill_discrete("Sq Footage") +
                scale_y_continuous("Counts") +
                scale_x_discrete("Year Built")+
                opts(axis.text.x = theme_text(angle=45, hjust=1))
Reply all
Reply to author
Forward
0 new messages