Downsampling time series

675 views
Skip to first unread message

Alan K.

unread,
Jan 5, 2016, 5:48:48 PM1/5/16
to Davis R Users' Group
First post here- thanks in advance for your help.  This is potentially both a math question and an R question.

I have timeseries data where the data were collected at a certain interval (e.g. at 22hz).  I need to convert these to a new timeseries that has a less frequent sampling interval (e.g at 12hz). I haven't found a specific function that can do this, so it seems like I'll need to do this in two steps- 1) use a function like a spline  on the original data to approximate the original time series, 2) apply a vector of new sampling times to the spline.

Can anyone weigh in on pros/cons of different spline functions (or point me to a quick discussion that would be intelligible for someone who is maybe not super mathy)? It seems like there are a lot of them, and I'm not really sure how best to choose.

Thanks!
Alan


Brandon Hurr

unread,
Jan 5, 2016, 6:00:13 PM1/5/16
to davi...@googlegroups.com
This might be daft, but I'm going to throw it out there. 

How about 11 hz? Then you can take every other observation (I think). 

If your observations are rows... 

require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
filter(df, row_number() %% 2 == 0)

B

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+...@googlegroups.com.
Visit this group at https://groups.google.com/group/davis-rug.
For more options, visit https://groups.google.com/d/optout.

Alan K.

unread,
Jan 5, 2016, 8:22:05 PM1/5/16
to Davis R Users' Group
Nice idea (and I did that with the ones that were exact multiples). Unfortunately the 12hz is non-negotiable. For a little more context, the data are from accelerometers on birds, and the goal is to develop algorithms to classify the patterns in timeseries as belonging to certain behaviors. Most of our training set is already at 12hz, and most/all of our unknown samples will be at 12hz as well. Our first year we played around with some different sample rates, and would like to be able to include those in the same analyses if possible.

Jaime Ashander

unread,
Jan 5, 2016, 8:33:35 PM1/5/16
to davi...@googlegroups.com, Davis R Users' Group
Quickly, this seems like it must be a very common issue with time series. Your approach makes sense but I'm surprised your searching didn't find examples specific to this problem.  Did you look at packages listed in cran time series task view (e.g., xts, zoo)?

Jaime

Ryan Peek

unread,
Jan 5, 2016, 9:33:30 PM1/5/16
to davi...@googlegroups.com
We run into this quite a bit with stream logger data and timelapse photo data. I'm not familiar with working in hz, but concept should work.

One approach is to set an interval and then round your times to that interval, then aggregate. Example below rounds to nearest 15 minutes.

## weird random data set, requires "lubridate" package
require(lubridate)
times<-seq(ymd_hms("2015-01-01 01:03:15"), ymd_hms("2015-01-02 23:14:12"),by = hms("0:33:22"))

## Round to nearest 15 min:
interval <- 15 # set to 15 minutes
timesround <- as.POSIXct(round(as.double(times)/(interval*60))*(interval*60),origin=(as.POSIXct(tz="GMT",'1970-01-01')))

df<-data.frame("time"=time, "timeround"=timeround, "data"=rnorm(7558,2,1))

## Now use dplyr to aggregate
df2 <- df %>% 
  mutate("year"=year(timeround),
         "month"=month(timeround),
         "yday"=yday(timeround),
         "hour"=hour(timeround),
         "minute"=minute(timeround)) %>% 
  group_by(year, month, yday, hour, minute)%>%
  summarize("min15.avg"=mean(data)) %>% 
  mutate("datetime"=ymd_hms(strptime(paste0(year,"-", month,"-", yday, " ",
                                            hour,":", minute, ":00"),format = "%Y-%m-%j %H:%M:%S"))) %T>%
  with(plot(datetime, min15.avg, col="blue")) %>% 
  as.data.frame()

## I've also seen a modulo approach i.e., 
## use dplyr and use numeric column to mutate your time by a given interval

mutate(df, "min3" = paste0(minute(time) - minute(time)%%3, "m"))

Hope this helps,
Adios,
Ryan

To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.

--
Check out our R resources at http://www.noamross.net/davis-r-users-group.html
---
You received this message because you are subscribed to the Google Groups "Davis R Users' Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to davis-rug+unsubscribe@googlegroups.com.



--

 
"When we try to pick out anything by itself, we find it hitched to everything else in the universe."
John Muir (My First Summer in the Sierra, 1911)
-----------------------------------------
PhD Candidate, Graduate Group in Ecology
Center for Watershed Sciences
University of California, Davis
-----------------------------------------

Brandon Hurr

unread,
Jan 5, 2016, 10:23:21 PM1/5/16
to davi...@googlegroups.com

On Tue, Jan 5, 2016 at 6:33 PM, Ryan Peek <rap...@ucdavis.edu> wrote:
timesround <- as.POSIXct(round(as.double(times)/(interval*60))*(interval*60),origin=(as.POSIXct(tz="GMT",'1970-01-01')))

Señor Peek, 

I'm getting an error here: 
> timesround <- as.POSIXct(round(as.double(times)/(interval*60))*(interval*60),origin=(as.POSIXct(tz="GMT",'1970-01-01')))
Error in interval * 60 : non-numeric argument to binary operator

Alan, 

Do you have an example birds data could share. I agree with Jaime that fitting a spline and then interpolating is a reasonable approach, but you'd need to see how well it fits the data you have. 22 Hz is once every 45ms and I know birds can be very fast, but I would expect a spline to fit gyro data from them quite nicely unless they are hummingbirds (those guys are freaky fast). 

I also agree that there should be something out there about this already, but finding it would likely be difficult and only then in a very small section in the M&M or the data never made it to publication.

B

Eric Holmes

unread,
Jan 5, 2016, 10:45:58 PM1/5/16
to davi...@googlegroups.com

Brandon, you may need to define the interval variable to a numeric value.

Alan, take a look at the approxfun () function, in the stats package I believe, for a linear interpolation approach.

Hope this helps,
Eric

--

Alan K.

unread,
Jan 6, 2016, 10:29:10 AM1/6/16
to Davis R Users' Group
Thanks Jaime- I'll check that out.

Alan K.

unread,
Jan 7, 2016, 12:58:19 AM1/7/16
to Davis R Users' Group
I somehow missed the other replies- thanks everyone for chiming in. One thing I realized since posting is that since all of the original data are integer, that I'll either need to round the interp values, or possibly go with the suggestion to assign each point to the most proximate point in the original data. If interested, I've attached a sample file.
Sample Chicken Timeseries.txt
Reply all
Reply to author
Forward
0 new messages