self joins for year-over-year analysis

17 views
Skip to first unread message

Andrew Willis

unread,
Mar 14, 2017, 9:13:46 AM3/14/17
to manipulatr
I am commonly working with data that has a value for every day, and I would like to do year-over-year comparisons. What's the best way to do a self join in order to grab "earlier" data from the same tibble?

Here's an example tibble:

a <- tribble(
 
~day, ~value, ~same_day_last_year,
  ymd
("20170313"), 180, ymd("20160314"),
  ymd
("20170312"), 150, ymd("20160313"),
  ymd
("20160314"), 100, ymd("20150316"),
  ymd
("20160313"), 95, ymd("20150315")
)

I would like to be able to join this tibble with itself so I can add a new column that shows the value from the same day last year.

Any pointers?


Thanks!
-Andrew

Brandon Hurr

unread,
Mar 14, 2017, 10:57:16 AM3/14/17
to Andrew Willis, manipulatr
Andrew, 

What you want is only sort of clear to me.

You seem to want to join a table with itself (hence "self_join"), but I'm not sure that makes a lot of sense. 

If you simply want to take a value minus the previous value you should consider a lag: https://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html

If you want to take values from two tables and subtract them then you should have two tables, each with a date column and a value column. 

library(tidyverse)
library(lubridate)

firstyear <- tibble(date = as.Date(as.Date("2011-06-30"):as.Date("2011-07-04"), origin="1970-01-01"), value = runif(5))
secondyear <-tibble(date = as.Date(as.Date("2012-06-30"):as.Date("2012-07-04"), origin="1970-01-01"), value = runif(5))

# convert date to joinable numerics
firstyear <- 
firstyear %>%
mutate(month = month(date), day = day(date))
secondyear <- 
secondyear %>%
mutate(month = month(date), day = day(date))

# look at join
full_join(firstyear, secondyear, by = c("month", "day")) %>%
# subtract firstyear$value from secondyear$value
mutate(subtractedvalue = value.y - value.x)


HTH,
B


--
You received this message because you are subscribed to the Google Groups "manipulatr" group.
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+unsubscribe@googlegroups.com.
To post to this group, send email to manip...@googlegroups.com.
Visit this group at https://groups.google.com/group/manipulatr.
For more options, visit https://groups.google.com/d/optout.

Andrew Willis

unread,
Mar 14, 2017, 11:58:17 PM3/14/17
to manipulatr, andrew...@gmail.com
Thank you Brandon for your thoughts. 

I'm not able to use a lag to get the behavior I want because the day that I want to compare to can vary. For example, I want to compare the first Monday in March of 2017 with the first Monday in March of 2016. There are a few more details into how we figure out which days to compare, so we have a database table that has a row for each day, along with the corresponding day in the prior year.

While the data I have right now is in a single table (it's daily sales figures), I could definitely filter it out into two separate tibbles and then join them. It sounds like that extra step of dividing the data into two sets is going to be needed. 


Thanks again,
-Andrew
To unsubscribe from this group and stop receiving emails from it, send an email to manipulatr+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages