best way to use dplyr/tidyr functions with time series data

5,607 views
Skip to first unread message

Paolo Cavatore

unread,
Feb 18, 2015, 9:09:11 AM2/18/15
to manip...@googlegroups.com
I'm wondering what is the best way to apply dplyr/tidyr functions to time series data like stock prices.

A similar question has been answered at http://stackoverflow.com/questions/1181060 but the example start from a data.frame object which is rarely the case.

The most widespread format is xts/zoo as per the below example using the tseries::get.hist.quote function:

library(tseries)
spy <- get.hist.quote(instrument="spy",start="2015-01-01",quote="AdjClose",compression="d")
agg <-get.hist.quote(instrument="agg",start="2015-01-01",quote="AdjClose",compression="d")
port <-as.xts(merge(spy,agg))
colnames(port) <-c("spy","agg")

As dplyr/tidyr functions works on dataframes we cannot run directly something like:

> gather(port, stock, price, 1:2)
Error in UseMethod("gather_") :
  no applicable method for 'gather_' applied to an object of class "c('xts', 'zoo')"

but must convert "port" to a data.frame first:

gather(data.frame(Date=index(port), coredata(port)), stock, price, -Date)


Now my question is whether there is a better way to use dplyr/tidyr functions without going back and forth between xts/zoo and data.frame format.

Ideally a solution converting "port" into a tbl object on the fly would be perfect:

> gather(as.tbl(port), stock, price, -Date)
Error in UseMethod("as.tbl") :
  no applicable method for 'as.tbl' applied to an object of class "c('xts', 'zoo')"

but unfortunately as.tbl() does not support xts/zoo objects...maybe it could be enhanced to support them too.


D Holmes

unread,
Feb 18, 2015, 7:55:04 PM2/18/15
to manip...@googlegroups.com
My solution to this problem has been to write simple functions (e.g.) xts2df and df2xts which perform the operations as above.  That works quite well, but
requires a lot of back and forth between data.frames and xts objects.  I would gladly second a request for tighter integration between dplyr and xts, especially
if row names can be used in place of an explicit date variable.   It'd be quite cool to do something like a weekly VAR, e.g.

dta %>% filter('2014-01-01::') %>% group_by(ticker) %>% mutate(valatrisk=2.33*roll_sd(rtn)) %>% summarize(to_weekly)


Hadley Wickham

unread,
Feb 19, 2015, 7:56:11 AM2/19/15
to D Holmes, manipulatr
It might be possible to implement xts methods for dplyr generics to
eliminate the back-and-forth. I have no idea how hard this would be,
because I don't know anything about the internals of xts.

Hadley
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
http://had.co.nz/

Michael Cawthon

unread,
Feb 19, 2015, 9:25:43 AM2/19/15
to Hadley Wickham, D Holmes, manipulatr
Is lubridate not an adequate alternative? Or are there specific methods in xts that lubridate can't handle?

Hadley Wickham

unread,
Feb 20, 2015, 8:37:29 AM2/20/15
to Michael Cawthon, D Holmes, manipulatr
I believe xts is a more holistic solution for this sort of data than
lubridate - lubridate just deals with date/time values, xts deals with
time-indexed data.

Hadley
--
http://had.co.nz/

Paolo Cavatore

unread,
Mar 6, 2015, 12:17:23 PM3/6/15
to manip...@googlegroups.com, mcaw...@greenstenergy.com, dfhol...@gmail.com
The only fast&furious solution I came up with is similar to what D Holmes suggested:

 xts2df <- function(x) {
  data.frame(date=index(x), coredata(x))
}

gather(xts2df(port), stock, price, 2:3)
Reply all
Reply to author
Forward
0 new messages