Interpolate values across dates

25 views
Skip to first unread message

ArjunaCap

unread,
Jul 23, 2015, 6:50:42 PM7/23/15
to manipulatr
I'm wondering about my options for interpolation of data among dates.

Suppose I have some some data that is averaged/ aggregated by month:

rain_per_month <- c("jan" = 1.5, "feb" = 2.2,
                   
"mar" = 3.0, "apr" = 4.3,
                   
"may" = 5.75, "jun" = 6.2,
                   
"jul" = 6.1, "aug" = 5.8,
                   
"sep" = 4.7, "oct" = 3.7,
                   
"nov" = 2.75, "dec" = 2.2)


suppose moreover that I wish to assume the data is smooth and continuous

My Questions

Suppose I wished to assume the rain_per_month values fell in the middle of each month.

if I created a DAILY date vector, like so:

df <- data.frame(my_dates  = seq(as.Date("2015-01-01"), as.Date("2015-12-31"), by="days"))


1- How could I populate my data frame with a new variable, the sparse, middle month values from the rain_per_month vector

2- What are options for applying an interpolation technique at a daily granularity, and how could this be done? 

Thank you for any help

Hadley Wickham

unread,
Jul 23, 2015, 7:03:58 PM7/23/15
to ArjunaCap, manipulatr
My only hint is that interpolation implies a model, and R has lots and
lots and lots of tools for doing modelling...
Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Michael Cawthon

unread,
Jul 23, 2015, 7:17:51 PM7/23/15
to Hadley Wickham, manipulatr
Fair enough, and it's an honor to be answered by the master Himself. I guess my question is first more motivated by the nitty-gritty details 1) populating the node points. I was hoping for a "here's a neat trick using native date functions or lubridate" or "write a for loop creating the dates in text and then left_join on the dates"

I'm getting a little hung up on the syntactical details but I'll try a bit harder

Thanks again

Brandon Hurr

unread,
Jul 23, 2015, 7:21:52 PM7/23/15
to Michael Cawthon, Hadley Wickham, manipulatr
I haven't done this for a while, but from what I remember, you select a type of model that is appropriate for your dataset. Make a model of it. Create a dataframe with all possible timepoints and then use the predict function to output the datapoints. 

Since you are using time series data, you could consider this SO post as a template:

David Winsemius

unread,
Jul 24, 2015, 3:38:44 AM7/24/15
to ArjunaCap, manipulatr

On Jul 23, 2015, at 3:50 PM, ArjunaCap wrote:

I'm wondering about my options for interpolation of data among dates.

Suppose I have some some data that is averaged/ aggregated by month:

rain_per_month <- c("jan" = 1.5, "feb" = 2.2, 
                    "mar" = 3.0, "apr" = 4.3, 
                    "may" = 5.75, "jun" = 6.2, 
                    "jul" = 6.1, "aug" = 5.8, 
                    "sep" = 4.7, "oct" = 3.7, 
                    "nov" = 2.75, "dec" = 2.2)


suppose moreover that I wish to assume the data is smooth and continuous

One method might be to replicate the data on either "side" of the observed data (since the obvious naive model would be each year was like the next and then to use a smoothing functions such as loess.

dat <- data.frame(mo=names(rain_per_month), rain= rain_per_month)
expdat <- rbind(dat,dat,dat)

expdat$midmodt <- as.Date(paste0(rep(c("2013-","2014-","2015-"), each=12), dat$mo, "-15"), format="%Y-%b-%d")


My Questions

Suppose I wished to assume the rain_per_month values fell in the middle of each month.

if I created a DAILY date vector, like so:




df$est <- predict( loess( expdat$rain ~as.numeric(expdat$midmodt)),   
                    as.numeric(df$my_dates))

# Note I was assuming these were 2014 data since I cannot see into the future. 

df <- data.frame(my_dates  = seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by="days"))

 with(df, plot(est ~ my_dates))





1- How could I populate my data frame with a new variable, the sparse, middle month values from the rain_per_month vector

2- What are options for applying an interpolation technique at a daily granularity, and how could this be done? 

Thank you for any help

-- 
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at http://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

David Winsemius
Alameda, CA, USA

ArjunaCap

unread,
Jul 24, 2015, 7:17:57 PM7/24/15
to manipulatr, mcaw...@greenstenergy.com
I just saw David's reply-- thanks to everyone for the help.

For completeness, I'll share what I did.

# Function to generate middle-of-the-month nodes
mid_month
<- function(month_in_text) {
  lubridate
::yday(ymd(paste("2015", month_in_text, "15"))) ## Not a leap year
 
}


days <- unlist(lapply(names(rain_per_month), mid_month))


Define a function using splinefun; I'm not sure how this compares to loess.

f <- splinefun(days, rain_per_month, method = "periodic")

 
As in the defined function above, we again take advantage of the useful yday function from the lubridate package:

df$rain <- f(lubridate::yday(df$my_dates))


Reply all
Reply to author
Forward
0 new messages