Re: deprecated GGplot code affecting date data?

80 views
Skip to first unread message

Brandon Hurr

unread,
Jan 15, 2013, 3:25:39 AM1/15/13
to Mark Turner, ggplot2
Mark, 

Can you supply a fully working example with a sample dataset? I imagine it has to do with the format that your date values are stored in, but it's really hard to say. 

B


On Mon, Jan 14, 2013 at 9:14 PM, Mark Turner <nzwe...@gmail.com> wrote:
Hi all,
I am a complete newby to R having picked up a load of R code from the previous analyst at my new position.  I'm trying to get the gist of it, but this new development has me stumped.
It all worked fine until I updated R and ggplot2.  R version 2.15.2 and  ggplot2_0.9.3  [I also had to manually load a bunch of other libraries, so there is the chance that I missed something...]
now I get a load of 'depreated code' warnings, and one issue that is an actual problem.
Code that did work before around dates now doesn't.
my understanding is that lubridate [version: lubridate_1.2.0] handles the date function along with scales (scales_0.2.3)
 
so if existing code says something like:
 

data1$datemonth <- dmy(data1$date)

 

Below this when you run the code it always says something like: 

 

80 parsed with %d-%b-%y

 

or

 

data2$monthdate <- dmy(data2$date)

shows in red:
 120 parsed with %d-%b-%y

> 

> 

> 

> Which is new but seems ok; But then it flips the date around from ‘7-12-2012’

> to something like 2012-12-07 and says in red after you actually run the code:

Error: Discrete value supplied to continuous scale

(which I can only assume applies to the date data...)
 
 
below is the code I ran to create the graph:
 
[I've left out the directory commands]
 
# load the necessary libraries/packages
library(ggplot2)  # This command looks in the library folder in the R program files (C:) to load the package ggplot2, which you had to install there
library(lubridate) # Same thing for the package 'lubridate', which is a set of 'easy' date-related functions
library(scales)  # Needed for date formatting
 
dataint1<-read.csv("F_EW1A_INT1.csv")
dataint1$monthdate <- dmy(dataint1$date)
dataint1$area <- factor(dataint1$area, levels = c("New Zealand", "Canterbury" ))
 
setwd(kCommonDir)
ylabel <- ylab("Total cancellations, monthly")
xlabel <- xlab("")
titleint1 <-  opts(title = comment(dataint1), plot.title = theme_text(size=6, colour="grey", hjust=1))
 
> plotint1 <- ggplot()+
+ theme_bw() +
+ geom_line(data = dataint1, size=1, aes(x = monthdate, y = cancellations, colour=benefit_type)) +
+ scale_colour_manual (values=colours) +
+ yfont +
+ ylabel +
+ xlabel +
+ legend +
+ titleint1 +
+ facet_grid(area ~ ., scale="free_y")
> plotint1
 
thanking you all in anticipation.  If there is something extra that you might need to assist, can you let me know how to obtain that info (what code to run or whatever)
 
kind regards,
 
 
Mark Turner

Social and Cultural Recovery

Canterbury Earthquake Recovery Authority (CERA)

Private Bag 4999, Christchurch 8140

 

--
You received this message because you are subscribed to the ggplot2 mailing list.
Please provide a reproducible example: https://github.com/hadley/devtools/wiki/Reproducibility
 
To post: email ggp...@googlegroups.com
To unsubscribe: email ggplot2+u...@googlegroups.com
More options: http://groups.google.com/group/ggplot2

Mark Turner

unread,
Jan 15, 2013, 5:37:38 PM1/15/13
to ggp...@googlegroups.com, Mark Turner
Hi Brandon
Thanks for your reply.
I have had a further play with the code and found that when I comment out vertical bars (indicating 2 time points when Christchurch was hit by major earthquakes) the code now runs.  So at the risk of getting this completely wrong (and ignoring your request above- sorry Brandon), I am posting the code that is used to make the vertical lines below.  If this is commented out, the graph runs fine.
 
this is the code to load the EQ data:
 
# Load EQ and colours
load("colourEQ.RData", .GlobalEnv)
#this is code in the ggplot statement
geom_vline(data = EQ, colour = colourEQ, linetype = "dashed", aes(xintercept = start )) +
 
plot1 <- ggplot()+
  theme_bw() +
  #geom_rect(data = EQmonth3roll, fill = colourEQ , aes(xmin = start, xmax = end, ymin = -Inf, ymax = Inf)) +
  geom_vline(data = EQ, colour = colourEQ, linetype = "dashed", aes(xintercept = start )) +
  geom_line(data = data1, colour=coloursingle , size=1, aes(x = datemonth, y = count) ) +
  ylabel1 +
  yfont +
  xlabel +
  title1 +
  legend +
  facet_grid(area ~., scale="free_y")
plot1
 
 
It is stored in a common file that is loaded by:
# Load EQ and colours
load("colourEQ.RData", .GlobalEnv)
 
 
#code from common file#
#
# Set up earthquake data
#
#
##### SET UP #####
rm(list=ls(all=TRUE))
# load the necessary libraries
library(lubridate)
# Set paths and macros
kBaseDir   <- file.path("//corp.ssi.govt.nz/shared/CERA/Community Wellbeing/Social Outcomes Framework/Indicators/Common")
#Change to the new directory
setwd(kBaseDir)
 
##### BUILD EARTHQUAKE DATASETS #####
# Dates (one day)
EQ <- read.csv("EQ_dates.csv")
EQ$start <- dmy(EQ$start)
EQ
# Monthly
EQmonth <- read.csv("EQ_dates_monthly.csv")
EQmonth$start <- dmy(EQmonth$start)
EQmonth$end <- dmy(EQmonth$end)
EQmonth
# Monthly, 3 month rolling (ie quarter ending month)
EQmonth3roll <- read.csv("EQ_dates_monthly3roll.csv")
EQmonth3roll$start <- dmy(EQmonth3roll$start)
EQmonth3roll$end <- dmy(EQmonth3roll$end)
EQmonth3roll

# Quarterly
EQquarter <- read.csv("EQ_dates_quarterly.csv")
EQquarter$start <- dmy(EQquarter$start)
EQquarter$end <- dmy(EQquarter$end)
EQquarter
# Monthly - for bar charts
EQmonth_bar <- read.csv("EQ_dates_monthly_bar.csv")
EQmonth_bar
# Yearly
EQyear <- read.csv("EQ_dates_yearly.csv")
EQyear$start <- dmy(EQyear$start)
EQyear$end <- dmy(EQyear$end)
EQyear
# Yearly, year ending June
EQyearendjune <- read.csv("EQ_dates_year_end_june.csv")
EQyearendjune$start <- dmy(EQyearendjune$start)
EQyearendjune$end <- dmy(EQyearendjune$end)
EQyearendjune

# Yearly, year ending each quarter
EQyearendquarter <- read.csv("EQ_dates_year_end_quarter.csv")
EQyearendquarter$start <- dmy(EQyearendquarter$start)
EQyearendquarter$end <- dmy(EQyearendquarter$end)
EQyearendquarter
 
##### SAVE DATA WITH EARTHQUAKE INFO#####
save(colourEQ,coloursingle,colours,EQ, EQmonth,EQquarter,EQmonth_bar, EQyear,EQyearendjune,EQyearendquarter,EQmonth3roll,  file= "colourEQ.RData")
# To load that set of saved data
# rm(list=ls(all=TRUE))
# load("colourEQ.RData", .GlobalEnv)

# END OF FILE
 
As I said, this is the code in the plot code:
 
# Load EQ and colours
load("colourEQ.RData", .GlobalEnv)
geom_vline(data = EQ, colour = colourEQ, linetype = "dashed", aes(xintercept = start )) +
this code in the plot calls up vertical lines indicating dates of quakes so viewers can gauge activity before and after the 2 main quakes. I left this code out of the original post to avoid complicating an already complicated question, but now it seems to be the issue.
the data 'EQ' is a csv file that contains the following:'
startevent
4/09/20101
22/02/20112
----
 
geom_vline(data = EQ, colour = colourEQ, linetype = "dashed", aes(xintercept = start )) +
if the above code is commented out, the graph is fine (minus the vertical 'EQ' lines of course)  but if included, the following error message is displayed:
 
Error: Discrete value supplied to continuous scale
 
 
hope this makes some kind of sense?
 
regards,
Mark

Brandon Hurr

unread,
Jan 16, 2013, 2:23:17 AM1/16/13
to Mark Turner, ggplot2
Mark, 

I would say that it's the format of your date column in the EQ dataframe. If it is being stored as a character (or in this case a factor) and you are trying to align that to other data which is stored as.Date() then it will not know what to do. Make sure the format between EQ and your other datasets are the same. 

Brandon

Raphael Mazor

unread,
Jan 16, 2013, 2:38:44 PM1/16/13
to ggp...@googlegroups.com
Sorry. Here's a reproducible example:



mydata<-data.frame(SiteSet=rep(c("SetA","SetB","SetC"), each = 200),
                   Metric=rep(c("Metric1","Metric2"), each=100))
mydata$Observed<-ifelse(mydata$Metric=="Metric1", runif(100,min=0,max=1), runif(100,min=0,max=100))
mydata$Expected<-ifelse(mydata$Metric=="Metric1", runif(100,min=0,max=1), runif(100,min=0,max=100))

ggplot(data=mydata, aes(x=Expected,y=Observed))+
  geom_point()+
  facet_grid(Metric~SiteSet, scales="free")

I get the following graph:


If i flip facet grid as follows, I get a different, but also wrong result:
ggplot(data=mydata, aes(x=Expected,y=Observed))+
  geom_point()+
  facet_grid(SiteSet~Metric, scales="free")



If I use facet_wrap, the scales are correct, but I lose the gridded structure. For this example, it's not a problem, but my real data is a bit more complex:

mydata$MetricSiteSet<-paste(mydata$Metric,mydata$SiteSet, sep=".")
ggplot(data=mydata, aes(x=Expected,y=Observed))+
  geom_point()+
  facet_wrap(~MetricSiteSet, scales="free")


Raphael Mazor
Freshwater Biologist
Southern California Coastal Water Research Project
3535 Harbor Blvd, Suite 110
Costa Mesa, CA 92626

www.sccwrp.org
tel: 714-755-3235
fax: 714-755-3299

Raphael Mazor

unread,
Jan 16, 2013, 2:39:38 PM1/16/13
to ggp...@googlegroups.com
My apologies, but I sent this last email with the incorrect subject

Raphael Mazor
Freshwater Biologist
Southern California Coastal Water Research Project
3535 Harbor Blvd, Suite 110
Costa Mesa, CA 92626

www.sccwrp.org
tel: 714-755-3235
fax: 714-755-3299

Dennis Murphy

unread,
Jan 16, 2013, 7:06:29 PM1/16/13
to Raphael Mazor, ggp...@googlegroups.com
Hi:

Thanks for the example. It makes the problem much clearer to see.

The problem is that faceted 2D plots by facet_grid() cannot have separate, "free" x and y axis pairs in each panel, as you discovered - the y-axis in each row needs to be consistent and the x-axis in each column needs to be the same. You have such a structure marginally - i.e., you have consistent x/y axes for Metric1 and the same for Metric2 - but the x/y-axes differ for Metric1 and Metric2.

I think there is a compromise solution that is fairly easy to produce using the gridExtra package - I don't know whether or not you'll find it satisfactory for your 'real' problem, though.

library(ggplot2)

mydata<-data.frame(SiteSet=rep(c("SetA","SetB","SetC"), each = 200),
                   Metric=rep(c("Metric1","Metric2"), each=100))
mydata$Observed<-ifelse(mydata$Metric=="Metric1", runif(100,min=0,max=1), runif(100,min=0,max=100)) 
mydata$Expected<-ifelse(mydata$Metric=="Metric1", runif(100,min=0,max=1), runif(100,min=0,max=100))

# Split into two data frames, one for Metric1 data and another for Metric2.
# Redefine the factors so that the resulting Metric factor has one level.
metric1 <- subset(mydata, Metric == "Metric1")
metric1$Metric <- factor(metric1$Metric)
metric2 <- subset(mydata, Metric == "Metric2")
metric2$Metric <- factor(metric2$Metric)

Each of these produces a three row, one column grid of plots:

p1 <- ggplot(metric1, aes(x = Expected, y = Observed)) +
         geom_point() +
         facet_grid(SiteSet ~ Metric)

# same plot code as p1, just substituting the input data frame
p2 <- p1 %+% metric2 

# Put the two graphs side-by-side:
library(gridExtra)
grid.arrange(p1, p2, ncol = 2)

The result is two grids instead of one, but the scales are self-consistent within graph.

If you want to get rid of the y-axis label in p2, use

p2 <- p1 %+% metric2 + ylab(NULL)


HTH,
Dennis
jigjeidi.png
hgdiidhg.png
dbigbcfj.png
Reply all
Reply to author
Forward
0 new messages