using obsCovs in unmarkedFrame

Alex Anderson

unread,

Mar 14, 2011, 4:35:17 AM3/14/11

to unma...@googlegroups.com

Hi All
I am in the process of adapting an analysis of some bird survey data that I have been running in Distance to the package "unmarked" I have been able to get a simple test run to output some estimates of density, but I have a few queries I hope someone might be able to answer?

I have used "formatDistData" to create a data frame of my observations for one species, binned into distance classes. I have also created the siteCovs data frame which has gone into the arguments to "unmarkedFrameDS" fine.

I also have some observation covariates (cluster size, height) that I wish to include as "obsCovs" in the arguments to "unmarkedFrameDS". When I produce a matrix of observations and their metadata though using formatDistData this produces a matrix of dimensions M x J, where J is the number of distance breaks +1 (and formatDistData and requires the argument dist.breaks). According to the documentation, unmarkedFrame wants a y matrix that is M x J where J is number of observations. How can I produce either y matrix with J = obs, or an obsCovs matrix that unmarkedFrame will be able to use?

I also want to calculate Effective Strip Widths for my surveys, sites and species as in Distance... any ideas how to do this in unmarked?

I also see that clustered data is not explicitly supported in the unmarked analysis framework, but as far as I know density estimates in Distance for clustered data are simply multiplied by mean cluster size, so this should be straight forward... as long as the mean cluster size is appropriate for the stratum being estimated?

Any help would be much appreciated
Cheers
Alex

Richard Chandler

unread,

Mar 14, 2011, 9:09:43 AM3/14/11

to unma...@googlegroups.com

Hi Alex,

On the distsamp help page, under "Note", it says "you cannot use
obsCovs." In the distance sampling context, obsCovs are
distance-interval-specific covariates, and I'm not sure if how the
model would perform with that type of covariate. Maybe Andy can weigh
in about this. If the model does work in this case, perhaps I should
make it possible to include obsCovs.

As for cluster size, you are right that you could simply estimate the
density of clusters and then multiply by the mean cluster size.
However, you would probably want to determine if detection probability
is affected by cluster size. The only way I can think of to do that
with this model would require that a maximum of 1 cluster was detected
in each "cell" of your y matrix. Then you could use cluster size as a
distance interval-specific-covariate (assuming this is possible).

I should add functions to return effective strip half-width and
effective area. Until then you can do it manually using the formulas
found in Buckland et al. 2001. The first example on the help page is
for line transects and yields an estimate of 10.9 for the half-normal
shape parameter. The maximum distance is 20, so effective strip
half-width is

integrate(gxhn, 0, 20, sigma=10.9)$value # =12.75m
12.75/20 # = 0.64 is the detection probability

For the point-transect example, the shape parameter estimate is 9.8
and effective area is

grhn <- function(r, sigma) exp(-r^2/(2*sigma^2)) * r
2*pi * integrate(grhn, 0, 25, sigma=9.8)$value # = 580 m^2
580/(pi*25^2) # = 0.30 is the detection probability

Thanks for your questions. Hope this helps.

Richard

__________________________________
Richard Chandler
USGS Patuxent Wildlife Research Center

Richard Chandler

unread,

Jan 11, 2013, 9:31:37 AM1/11/13

to Unmarked package

Hi Erik,

If you made 3 visits to each site, then you probably want to use gdistsamp. distsamp is meant for single visit distance sampling.

If you use gdistsamp, the variables: weather, observer, elevation, depth would be considered yearlySiteCovs. This is confusing, I know, and I should really change it to primarySiteCovs or something. Anyhow, you can model sigma as a function of these covariates.

For distance sampling models in unmarked, you cannot use obsCovs because these would be covariates measured at the level of the distance interval.

Richard

On Thu, Jan 10, 2013 at 4:31 PM, <erikw...@gmail.com> wrote:

(e.g. weather, observer, elevation, depth)

erikw...@gmail.com

unread,

Feb 8, 2013, 4:44:19 PM2/8/13

to unma...@googlegroups.com

Hi all,

Format and structure issues are all covered in ?help and vignettes but because I said I'd check back in regarding use of covariates in gdistsamp()..... here goes.

I've successfully transitioned to gdistsamp() and incorporated 'yearlySiteCovs' to model the detection process. Understanding data format and structure is always a keystone step for me. Most of what you'll find here is beginner's work but beginning is, arguably, the most important step. So here's how I approached gdistsamp() and covariates in hopes it can help someone else out.

I've tried to be as broad as possible to make this applicable to other projects. Please take the interpretations of the process (and my clunky code) as my own and with a grain of salt. If I've erred in some way please call me out. There are far better explanations and examples out there (see anything by Chandler, Fisk, Kery, MacKenzie, Royle et al.) but here's how I did it:

- All data were maintained in an Excel spreadsheet (yup, Excel) and converted to a .txt (tab delimited) file for import to R.

- I imported three separate files: (1) distance data (unbinned and continuous), (2) siteCovs (i.e., "site-level" covariates that influenced the abundance process and did not change over the course of the study), and (3) yearlySiteCovs (i.e., "sample-level" covariates that influenced the detection process, stochastic and were measured each survey at each sample station).

Excel Structure:

(1) Distance Data
site_id = sample station identification #; dist = distance to study species, occasion = sample round

*duplicate site_id is when multiple observations of different individuals at a particular station were recorded. During occasions where no individuals were observed, the site_id was entered with no distance but with the corresponding round. R will format this distance to NA - no problem. Important placeholders for when you bring it all together in your unmarkedFrameGDS and more importantly, excluding sample sites with no detections will bias your abundance estimates high. If you have 4 sample stations, a maximum of 1 detection/station during 3 occasions you will have 12 rows.

site_id	dist	occasion
1	84	1
1	14	1
2		1
3	13	1
3	56	1
4		1
1 ...	12 ...	2 ...

(2) siteCovs
site_id = must be the exact same as used in your distance data; repeated only once; cov_x = covariate data measured (in field or remote sensing or historic data, etc.) at each sample station (e.g., average dbh, % landcover, patch size, etc.). They can be related or non-related. The example below could be considered %cov_x within y distance of a station where the sum of all covariates = 100%. For example cov_1=% coral reef, cov_2=% sandy bottom, cov_3=% rock.

site_id	cov_1	cov_2	cov_3....
1	16.5	32.9	50.6
2	65.9	12.3	21.8
3	0	14.9	85.1
4	82.2	14	3.8

(3) yearlySiteCovs
site_id = must be the exact same as used in your distance data; repeated only once; x and y_1... are "sample-level" covariates that you measure at each site, each sample occasion. In this case at 4 sites during 3 occasions where, say x = pH and y = temperature. (clearly I'm not an aquatic biologist).

site_id	x_1	x_2	x_3	y_1	y_2	y_3	.....
1	2	3	2.5	56.9	54.9	55.7
2	1.5	2	2	45.5	46.5	49.1
3	5	5	6	65.8	64.3	63.2
4	4	4	3	46.5	45.6	44.9

Convert each file to .txt/.csv/other compatible file format and import to R.

#Example R Code:
#site_id = in all files, ensure classification as "factor", same for occasion

## Import distance data
DIST = read.delim(file="file.name.txt", header=TRUE, sep="\t", colClasses=c("factor","numeric","factor"))
DIST1 = data.frame(DIST)

## Import site-level covariate ('siteCovs') data
COVS = read.delim("file.name.txt", header=TRUE, sep="\t", colClasses=c(rep("numeric",3)))

## Import sample-level covariate ('yearlySiteCovs') data
YRCOVS = read.delim("file.name.txt", header=TRUE, sep="\t", colClasses=c("factor",rep("numeric",6)))

## Be sure to standardize your covariates. This is covered at length in other parts of the group.

## Specify distance breaks - but first explore data via histogram (see: Buckland et al. 2001. Intro to distance sampling)
db <- seq(0,100,by=20)

## Format distance data into multinomial format
y.dat1 = formatDistData(distData=DIST1, distCol="dist", transectNameCol="site_id", dist.breaks=db, occasionCol="occasion")

Using the "occasionCol", your distance data are partitioned into the 3 sample periods which you've specified in file DIST1 [(1) above] and by the number of distance breaks (db). Staying with this example, you have 5 distance intervals (0-20,20-40...100) and 3 sample periods for 15 columns.

## Combine multinomial data in an unmarkedframeGDS
umf <- unmarkedFrameGDS(y=as.matrix(y.dat1), siteCovs=COVS, numPrimary=3,
                yearlySiteCovs=YRCOVS, dist.breaks=db, survey="point", unitsIn="m")

Here's where all the data is mashed together into an unmarkedFrameGDS, suitable for gdistsamp( ) modeling. Note the 'numPrimary' = 3 which corresponds to the number of "occasions" within the study period. I've confused this with secondary samples and received an error which was reported elsewhere in this unmarked group. Also, specify the units you are working in.

From here you can integrate your umf and covariates into the gdistsamp function, start your assessment of model fit of P and NB distributions and build your models. Check for model performance and dispersion.

The order of operations for ~ = abundance (lambda), availability (phi) and detection (p).

Example Null Model
m1n.NB    = gdistsamp(~1,~1,~1, umfsp1, keyfun = "halfnorm", mixture = "NB", output = "density",unitsOut = "ha")

Example Detection Model (you could also add interaction*terms)
m2d.NB    = gdistsamp(~1,~1,~y_1+y_2+y_3, umfsp1, keyfun = "halfnorm", mixture = "NB", output = "density", unitsOut = "ha")

Example Abundance Model
m3a.NB    = gdistsamp(~cov_1+cov_2+cov_3,~1,~1, umfsp1, keyfun = "halfnorm", mixture = "NB", output ="density",unitsOut = "ha")

..and combinations thereof.

Function 'aictab' in package AICcmodavg by M. Mazerolle allows for easy (Q)AICc model selection, average and predictions. A new guidance document came out today and package version 1.27 should hit CRAN shortly.

####

A couple asides:

For me, Rstudio has helped visualize data, code structure and format more easily than the standard R package. It changed my caustic relationship with R to a slightly more bearable one. Try it out, you may like it.

Here's a great resource for information, example code and the 2-day webinar by Royle and Chandler. Awesome and really helpful, but your brain may explode.

Also, a recent publication that used gdistsamp:

Sillett, S., R. B. Chandler, J. A. Royle, M. Kery, and S. A. Morrison. 2012. Hierarchical distance sampling models to estimate population size and habitat-specific abundance of an island endemic. Ecological Applications 22:1997-2006.

Sillett et al. R code is available through the ESA archives and in R via data(issj), also extremely helpful.

Good luck.

On Friday, January 11, 2013 10:46:07 AM UTC-6, erikw...@gmail.com wrote:

Hi Richard,

Thanks for the timely reply! I'll proceed with gistsamp and incorporate the covariates as you suggested. It will take a little time to redirect my course but i'll check back in with results so we can finalize this thread.

Thanks again for your great work. Erik

Matthew J Butler

unread,

Feb 10, 2013, 1:38:31 PM2/10/13

to unma...@googlegroups.com

Erik,

If your site_id is sorted as numeric (i.e., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, …) instead of as a factor (i.e., 1,10, 2, 3, 4, 5, 6, 7, 8, 9, …) in your text files, the ‘formatDistData’ fuction will sort the site_id as factor if it is imported as a factor. Thus, when you use the ‘unmarkedFrameGDS’ function, the covariates for site_id 2 will be associated with site_id 10. Use ‘umf[1:20]’ to verify this for yourself.

I have found that if I import the site_id as numeric, the ‘formatDistData’ fuction will convert the site_id to a factor for me but leave it sorted the same way as it was imported.

Good Luck,

Matt Butler

--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Matthew J Butler

unread,

Feb 10, 2013, 2:04:23 PM2/10/13

to unma...@googlegroups.com

Erik,

I believe there is a problem with how you import the yearlySiteCovs as well. Check out:

umf <- unmarkedFrameGDS(y=as.matrix(y.dat1), siteCovs=COVS, numPrimary=3,

yearlySiteCovs=list(X=YRCOVS[,c(“x_1”,”x_2”,”x_3”)],Y= YRCOVS[,c(“y_1”,”y_2”,”y_3”)]),

dist.breaks=db, survey="point", unitsIn="m")

which means you can now specify the following in your model statements:

m2d.NB = gdistsamp(~1,~1,~Y, umf, keyfun = "halfnorm", mixture = "NB", output = "density", unitsOut = "ha", K=200)

Good Luck,

Matt Butler

From: unma...@googlegroups.com [mailto:unma...@googlegroups.com] On Behalf Of erikw...@gmail.com
Sent: Friday, February 08, 2013 2:44 PM
To: unma...@googlegroups.com
Subject: Re: [unmarked] using obsCovs in unmarkedFrame