Closest match join with dplyr

2,179 views
Skip to first unread message

Stuart Luppescu

unread,
Dec 6, 2015, 5:58:35 PM12/6/15
to manip...@googlegroups.com
Hello all, I have two data frames that I want to join by a time variable. One data frame has data sampled every few seconds, and another about once or twice an hour. I want to match each record in the first data frame (the one with the frequent samples) with the record in the second data frame (infrequent samples) that is closest in time. 

I found a way to do it using data.table, here:


but I would prefer to use dplyr because it's familiar (and I don't hardly know data.table at all). Anyone have a suggestion?

TIA.
--
Stuart Luppescu -- pixbuf .at. gmail.com
I made up a new word: plagiarism.
  --jpuller

Greg Snow

unread,
Dec 7, 2015, 12:38:20 PM12/7/15
to Stuart Luppescu, manip...@googlegroups.com
You could find the midpoints of each of the intervals in the sparser
dataset, then use the findInterval function on both datasets to find
which interval (based on the midpoints, no the original data) the time
periods fall into. Then merge on the interval ID's, these will now be
an exact match and any form of merging will work.
> --
> You received this message because you are subscribed to the Google Groups
> "manipulatr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to manipulatr+...@googlegroups.com.
> To post to this group, send email to manip...@googlegroups.com.
> Visit this group at http://groups.google.com/group/manipulatr.
> For more options, visit https://groups.google.com/d/optout.



--
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com
Reply all
Reply to author
Forward
0 new messages