Excellent suggestions by Yolanda. A couple additional ones that would help:
You can limit the list of fields to reduce the memory footprint. Just like applying a filter by tag ID or date range, you would want to do this prior to converting your data to a data frame.
Also make sure to use the newer version of the package. Early version of alltags was pulling data from the gps table, which was excruciatingly slow for large datasets. That view has been renamed and the default alltags does include those fields. Even without gps, it can still be problematic.
The optimal approach is often to create a custom query of the tables rather than use alltags. That can be done using dplyr or other similar packages. The biggest gain would likely be to avoid including the hits table. The runs table would likely be enough in most cases since it provides the start and end time of a run, plus the number of hits. I will look into creating a new view that does this for the package.
Alltags was created for convenience, but can break down with large datasets. We should try to provide better examples in the book about how to work around that problem.
Some people may already be able to share some code to help with this.
As a start, I pasted below a simplified version of the view that removes the hits table. Until we add this to the R package, you can create this view manually in your motus database(s) using a tool such as DB Browser for SQLite, for instance: https://sqlitebrowser.org/
In the case of the sample project 176, this brings down the number of rows from about 108,000 to about 4,800. Other approaches to summarize the data by time intervals (e.g. 5 or 10 minutes) as suggested by Yolanda would also work if you need better details on the fine scale activity.
The tsEnd and tsBegin fields are new and provide the outer bounds of the run for a given tag on a specific antenna. You can also remove any fields that you don’t plan to use from the view below, which will help with memory, or even entire tables if they are not needed, but you’ll need to dig a bit more into the sql syntax to do that.
CREATE VIEW allruns
AS
SELECT
t2.runID as runID,
t3.batchID as batchID,
t2.done as done,
CASE WHEN t12.motusTagID is null then t2.motusTagID else t12.motusTagID end as motusTagID,
t12.ambigID as ambigID,
t2.ant as port,
t2.len as runLen,
t2.tsBegin as tsBegin,
t2.tsEnd as tsEnd,
t3.monoBN as bootnum,
t4.projectID as tagProjID,
t4.mfgID as mfgID,
t4.type as tagType,
t4.codeSet as codeSet,
t4.manufacturer as mfg,
t4.model as tagModel,
t4.lifeSpan as tagLifespan,
t4.nomFreq as nomFreq,
t4.bi as tagBI,
t4.pulseLen as pulseLen,
t5.deployID as tagDeployID,
t5.speciesID as speciesID,
t5.markerNumber as markerNumber,
t5.markerType as markerType,
t5.tsStart as tagDeployStart,
t5.tsEnd as tagDeployEnd,
t5.latitude as tagDeployLat,
t5.longitude as tagDeployLon,
t5.elevation as tagDeployAlt,
t6a.deviceID as deviceID,
t6.deployID as recvDeployID,
t6.latitude as recvDeployLat,
t6.longitude as recvDeployLon,
t6.elevation as recvDeployAlt,
t6a.serno as recv,
t6.name as recvDeployName,
t6.siteName as recvSiteName,
t6.isMobile as isRecvMobile,
t6.projectID as recvProjID,
t7.antennaType as antType,
t7.bearing as antBearing,
t7.heightMeters as antHeight,
t8.english as speciesEN,
t8.french as speciesFR,
t8.scientific as speciesSci,
t8.`group` as speciesGroup,
t9.label as tagProjName,
t10.label as recvProjName,
t11.lat as gpsLat,
t11.lon as gpsLon,
t11.alt as gpsAlt
FROM
runs AS t2
left join allambigs t12 on t2.motusTagID = t12.ambigID
LEFT JOIN
batchRuns AS t3a ON t2.runID = t3a.runID
LEFT JOIN
batches AS t3 ON t3.batchID = t3a.batchID
LEFT JOIN
tags AS t4 ON t4.tagID = CASE WHEN t12.motusTagID is null then t2.motusTagID else t12.motusTagID end
LEFT JOIN
tagDeps AS t5 ON t5.tagID = CASE WHEN t12.motusTagID is null then t2.motusTagID else t12.motusTagID end
AND t5.tsStart =
(SELECT
max(t5b.tsStart)
FROM
tagDeps AS t5b
WHERE
t5b.tagID = CASE WHEN t12.motusTagID is null then t2.motusTagID else t12.motusTagID end
AND t5b.tsStart <= t2.tsBegin
AND (t5b.tsEnd IS NULL OR t5b.tsEnd >= t2.tsBegin)
)
LEFT JOIN
recvs as t6a on t6a.deviceID =t3.motusDeviceID
LEFT JOIN
recvDeps AS t6 ON t6.deviceID = t3.motusDeviceID AND
t6.tsStart =
(SELECT
max(t6b.tsStart)
FROM
recvDeps AS t6b
WHERE
t6b.deviceID=t3.motusDeviceID
AND t6b.tsStart <= t2.tsBegin
AND (t6b.tsEnd IS NULL OR t6b.tsEnd >= t2.tsBegin)
)
LEFT JOIN
antDeps AS t7 ON t7.deployID = t6.deployID AND t7.port = t2.ant
LEFT JOIN
species AS t8 ON t8.id = t5.speciesID
LEFT JOIN
projs AS t9 ON t9.ID = t5.projectID
LEFT JOIN
projs AS t10 ON t10.ID = t6.projectID
LEFT JOIN
gps AS t11 ON t11.batchID = t3.batchID
AND t11.ts =
(SELECT
max(t11b.ts)
FROM
gps AS t11b
WHERE
t11b.batchID=t3.batchID
AND t11b.ts <= t2.tsBegin
)
Denis Lepage
-------- Original message --------
From: Yolanda Morbey <ymo...@uwo.ca>
Date: 1/13/21 13:10 (GMT-05:00)
To: Motus Wildlife Tracking System <motu...@googlegroups.com>, pdl <p...@clevelandmetroparks.com>
Subject: Re: [motus-wts] Re: Converting R Data to Flat File (Section 3.5.8 in Motus R Book)
Sorry, I just realized that I actually started from an older version of the query which still relied on the gps table. In the case of the sample project, removing GPS doesn’t make any noticeable difference. It may with larger datasets, perhaps, so here is the revised version without GPS.
FROM
--
The Motus Wildlife Tracking System (Motus) is an international collaborative research network that uses coordinated automated radio telemetry to facilitate research and education on the ecology and conservation of migratory animals. Motus is a program of Birds
Canada in partnership with collaborating researchers and organizations. Learn more at
https://motus.org
---
You received this message because you are subscribed to the Google Groups "Motus Wildlife Tracking System" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
motus-wts+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/motus-wts/ace314f941d54d0bb7e136e3bd3cb9f3%40exchange.bsc-eoc.org.
Hi again,
Attached is a script (create allruns.R) that you can run on any existing motus file to add a new view called allruns, as described earlier. We will work that into the package soon. If people have any suggestions for it, please share them with me or the group.
The runs table provides a pretty good summary in replacement of the hits, especially at a larger spatial or temporal scale (e.g movement among receivers). Most runs are typically relatively short in duration, so not much details would be lost. It would generally be safe to assume that an animal is present at all time between the start and the end of a run (tsBegin and tsEnd). You really only need a relatively short period during which the tag is out of the range of the receiver to break a run (something like 10 minutes, as an order of magnitude).
The runs table also gives you just about as good details on departure and arrival times, for instance. You are mainly missing details of activity between the start of a run and the end of one, but runs are antenna specific, so there are some useful details too.
You can still use the runLen field to filter out the very short runs the same way (e.g. runLen > 3), since this is included in the runs table, and also use the motus filters. We’d have to look more closely to provide some examples of that.
Of course, if your goal is to identify time intervals when a tag is present on a finer scale, you may need to find alternative ways to get to the same point. With hits, you could do an aggregate query by dividing the ts by 300 (5 * 60 seconds) for instance, so you have a single record of all the 5 minute periods where a tag was detected. To get the same data from runs, you could pregenerate your intervals in a table, and use a query to identify whether those periods overlaps with a given run, without having to rely on the hits records, which should be much faster for large datasets. There’s no guarantee that you’d have an actual hit over each specific period, but you may decide that the assumption that the bird was probably nearby would work, unless you really need say minute by minute precision or smaller.
Attached are 2 short scripts (merge example and merge example daily) that do exactly what I describe above (assumes that you have already created the new view called allruns below). The first generates (for a single tag and where runLen > 3), the number of overlapping runs to any of the 5 minute periods I defined (note that some run may overlap multiple periods). The daily example uses a period of a full day instead (for all tags) and has a couple added parts. Instead of the number of runs, you should also be able to get the min(tsBegin) and max(tsEnd) for things like arrival and departure times. The daily example also has a short example of how to generate similar summaries from the alltags view instead. For a small dataset, both approaches are very fast, so it doesn’t really matter.
Runs also don’t give you relative signal strengths on different antenna, so if you are trying to measure directional movements (where are birds coming from or leaving to), you really need the hits table. But, as suggested, I would then recommend building routines that only pull data for a limited number of birds and/or a specific time period.
It’s hard to generalize much beyond that, as it will really depend on what application people try to do. If people want to share a few examples of code that relies on the hits table, we could try to explore ways to make them more efficient to produce the same results. We can try to add some of those examples within the R book as well.
Cheers
Denis
p.s. I’d certainly encourage folks to dive into the data.table package if you are data-inclined. I don’t have a lot of experience with this particular package, but it appears quite good for large data manipulations. https://www.rdocumentation.org/packages/data.table/versions/1.13.6/
--
The Motus Wildlife Tracking System (Motus) is an international collaborative research network that uses coordinated automated radio telemetry to facilitate research and education on the ecology and conservation of migratory animals. Motus is a program of Birds
Canada in partnership with collaborating researchers and organizations. Learn more at
https://motus.org
---
You received this message because you are subscribed to the Google Groups "Motus Wildlife Tracking System" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
motus-wts+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/motus-wts/10e01d96-a4d0-4f4f-a3ec-eafd31eca5bdn%40googlegroups.com.