Error in R when joining data

204 views
Skip to first unread message

jbarth1235

unread,
May 12, 2016, 8:35:09 AM5/12/16
to Motus Wildlife Tracking System
> shore.df<-full_join(x=shore, y=rekn.md.df, by="id")%>%filter(ts>=dep.date&ts<=end.date)%>%select(site,ant,ts,fullID,runLen,id,Location)
Error in eval(expr, envir, enclos) : std::bad_alloc
Error: cannot allocate vector of size 153 Kb

I keep receiving the error above when I try to combine metadata to the rekn nanotag data.  I have looked this up online and it seems my computer lacks sufficient space?  I started deleting unnecessary things from my laptop and selected a limited number of columns within data, which dwindled this error code down to 153Kb, but I can't seem to get this error to go away.  Has anyone else had trouble combining these large data sets in R? What can be done to bypass this (if anything)?

Thanks.


Josh

john brzustowski

unread,
May 12, 2016, 8:57:48 AM5/12/16
to motu...@googlegroups.com
Hi Josh,
Three suggestions:

1. It's hard to analyze without knowing what the columns in your variables
are, but is "id" a field that is unique in both tables?

The *_join functions return *every* combination that can be made between
rows in the x and y tables, as long as their "id" column matches. If
"id" is not unique, the result can be much larger than either table.
Perhaps you need to be joining on a different field(s)?

2. Do the filter() and select() before the join(), so that you're
joining smaller tbls. I can't give the code since I don't know which
table has which columns.

3. If your shore and rekn.md.df tbls are in src_sqlite objects
(i.e. tables that reside on disk in .sqlite files) rather than
in-memory data_frames, then the join, filter, and select can be performed
lazily, only returning what you actually ask for afterward:

shore.df<-left_join(x=shore, y=rekn.md.df, by="id") %>%
filter(ts>=dep.date&ts<=end.date) %>%
select(site,ant,ts,fullID,runLen,id,Location) %>%
head(100)

would return the first 100 rows of the result. I've used left_join
since full_join is not available in src_sqlite tbls.

J.

>
> Thanks.
>
>
> Josh
>
> --
> http://motus-wts.org/
> ---
> You received this message because you are subscribed to the Google Groups "Motus Wildlife Tracking System" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to motus-wts+...@googlegroups.com.
> To post to this group, send email to motu...@googlegroups.com.
> Visit this group at https://groups.google.com/group/motus-wts.
> To view this discussion on the web visit https://groups.google.com/d/msgid/motus-wts/275c9961-b9ee-4ea9-9afa-171a18630261%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--
#-----------------------------------
# John Brzustowski
# Wolfville, NS Canada

john brzustowski

unread,
May 12, 2016, 10:51:03 AM5/12/16
to Josh B, Motus Wildlife Tracking System
Hi Josh,

On Thu, May 12, 2016, at 10:27, Josh B wrote:
> From looking at the metadata and the example of tag table columns found on
> the sensorgnome website:
> https://sensorgnome.org/Post-Processing_Telemetry_Data/Tag_Table_Columns
>
> it appears "id" is a unique field for both sets of data (3 digit
> identifying number). I will try your other recommendations, stay tuned.

id is only unique if you don't have more than one tag with
the same Lotek ID #. (SG/motus use many more tags than there
are ids, and distinguish among them by burst interval). fullID
is usually safer as a unique tag identifier. But this is general
as I don't know what's in your tables, and using id might be fine.

> Lastly. For the code " filter(ts>=dep.date&ts<=end.date) ", should I be
> using actual dates I want to filter by in place of dep.date/end.date?

If you want to use non-column variables "dep.date" and "end.date" or
any other dates that aren't spelled out as numeric constants, you need
to use the filter_() version. Do

vignette("nse")

for details

You'd use:

... %>% filter_(~ ts >= dep.date & ts <= end.date)

where the tilde ("~") makes everything to the right a formula, which
causes filter_ to treat column names and non-column variable names
as you'd expect.

J.

john brzustowski

unread,
May 12, 2016, 10:18:19 PM5/12/16
to Josh B, Motus Wildlife Tracking System
On Thu, May 12, 2016, at 22:49, Josh B wrote:
> Example of aother R code giving me errors... codes are provided below.

I'm replying to this on the motus list; if people feel there's too much
of this kind of traffic here, we could start another list.

You didn't convert REEDS$ts to POSIXct, as you did for tides$ts,
and read.csv() has left it as a factor.

J.
P.S. Not sure how you're pasting code into messages, but it's getting
marked up with "*" (indicating bold?) which confuses things a bit.

> require(lubridate)
> require(plyr)
> require(ggplot2)
> library(sensorgnome)
> install.packages("XML")
> require(ggmap)
> require(data.table)
> require(gridExtra)
> require(dplyr)
>
> *##Download Reeds Beach Tag Data*
> REEDS<-read.csv("2014_alltags.csv",header=TRUE)
>
> *##Download Tidal Data*
> tides<-read.csv("Tidal Data.csv")
>
> *##Subset Tide/rename*
> tides<-subset(tides,select=c("Date.Time","Water.Level","Sigma"))
> tides<-rename(tides,ts=Date.Time, w.level=Water.Level)
>
> tides$ts<-as.POSIXct(tides$ts)
> del<-subset(REEDS,site%in%c("REEDS"))
>
> del<-subset(del, select=c(fullID, site, lat, lon, ant, ts))
>
> *##Make Map of Site*
> del.map<-get_map(location=c(lon= -75, lat=39),
> maptype="terrain",source="google",zoom=9)
> ggmap(del.map)+geom_point(data=del, aes(lon, lat, size=3, group=site,
> col=site))
>
> *##Rounding Tide Data to nearest 6min interval to join with
> detection data*
> *del.tide<-del*
> del.tide$ts<-round_any(del.tide$ts, 360)
>
> > del.tide$ts<-round_any(del.tide$ts, 360)
> Error in UseMethod("round_any") :
> no applicable method for 'round_any' applied to an object of class
> "factor"
>
> I'm lost as to why it won't let me round the ts column. The
> only thing I
> can think of is the date is also in the ts column (see below for
> example).
> Could this be throwing it all off? I want to join the tide data to the
> detection data, but before I can do that the ts need to be in the
> same time
> intervals (6min). Any ideas? Your help is much appreciated. Thank you.
>
> *example of ts column:*
> 2014-05-23 11:07:47
>
> *Joshua N. Barth*
> M.S. Student
> Environmental Science
> Graduate Assistant
> Wesley College
> 120 North State Street
> Dover, DE 19901
>
> c: (910) 320-4735
> jbart...@gmail.com
>
>
>
>
>
>
> On Thu, May 12, 2016 at 10:51 AM, john brzustowski
> <jbrz...@fastmail.fm>
Reply all
Reply to author
Forward
0 new messages