Data format for gdistsamp

Katie

unread,

Mar 20, 2012, 4:57:23 PM3/20/12

to unmarked

Hi all-

I am attempting to use gdistsamp to get density estimates for Desert
tortoises.

# sites = 50
# of surveys per site = 4

My data is formatted in the following way: 3 columns: SiteID, Distance
from line, and Survey#. Each row represents an individual tortoise. Is
there code to convert this into a matrix format: Columns=
#distancebins*#surveys, rows=SiteID, so that I do not have to do this
manually?

Thanks!

Katie

Andy Royle

unread,

Mar 20, 2012, 5:11:27 PM3/20/12

to unma...@googlegroups.com

hi Katie,

Coincidentally, we are giving some lectures on this stuff in an upcoming workshop and I have that material open on my desktop right now.
The summary is below.
If you get stuck please send me your data file and I'll help out.

regards,
andy

There are two easily mangeable data formats for using in unmarked:

1. observation level distance data
a matrix with n individuals rows and 2 columns: [distance, sample unit ID]

2. Multinomial frequencies: A matrix of nsites x (n distance classes) where element [i,j] is the number of individuals counted at point i and distance class j.

It sounds like you have type 1. I think the following example is in the help file or perhaps the vignette PDF that comes with unmarked:

> library(unmarked)
> dists <- read.csv(system.file("csv", "distdata.csv", package="unmarked"))
> head(dists, 10)
distance transect
1 1 a
2 18 a
3 7 a
4 2 a
5 13 b
...
...

### Does this data file resemble yours?
### This sample data set has transects labeled "a" through "g" but "g" did not have any observed individuals so we had to fill that in as follows:

> # Fill-in the 0 observations:
> levels(dists$transect) <- c(levels(dists$transect), "g")
> levels(dists$transect)
[1] "a" "b" "c" "d" "e" "f" "g"

####
#### Now convert this to multinomial format:

> yDat <- formatDistData(dists, distCol="distance",
transectNameCol="transect", dist.breaks=c(0, 5, 10, 15, 20))

> yDat # could read-in multinomial observations directly
y.1 y.2 y.3 y.4
a 2 1 0 1
b 2 1 1 0
c 2 2 0 0
d 2 1 1 0
e 1 0 1 2
f 2 1 1 0
g 0 0 0 0

J. Andy Royle
Research Statistician
USGS Patuxent Wildlife Research Center
12100 Beech Forest Rd.
Laurel, MD 20708
http://profile.usgs.gov/professional/mypage.php?name=aroyle
andy_...@usgs.gov
phone: 301-497-5846
fax: 301-497-5545

Book: "Hierarchical Modeling and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities" by J. A. Royle and R.M. Dorazio.

unmarked: A very useful R package for fitting certain hierarchical models using likelihood methods. Available from: http://cran.case.edu/web/packages/unmarked/index.html

A 5 hour "introduction to unmarked" Webinar can be found here: http://www.pwrc.usgs.gov/Royalvideo.cfm

From:	Katie <grayk...@gmail.com>
To:	unmarked <unma...@googlegroups.com>
Date:	03/20/2012 05:05 PM
Subject:	[unmarked] Data format for gdistsamp
Sent by:	unma...@googlegroups.com

Katherine Gray

unread,

Mar 20, 2012, 5:31:09 PM3/20/12

to unma...@googlegroups.com

Hi Andy,

Thank you for your reply!

### Does this data file resemble yours?

It does, except that my data also has 4 surveys per site. There is a column that identifies the survey # (1,2,3,or 4) for each detection. I am wanting to create distance bins for each survey on each site. Any suggestions?

Thanks again,

Katie

Andy Royle

unread,

Mar 20, 2012, 5:35:57 PM3/20/12

to unma...@googlegroups.com, unma...@googlegroups.com

hi Katie,
Ah -- ok, currently formatDistData does not do that I think. But if you send me some of your data, off the group, then I will modify formatDistData to handle multiple samples and send you the new function to use until we update unmarked.
For the very short term you could run the function for _each_ survey # and just combine the data sets after.
regards
andy

Reply all

Reply to author

Forward