unmarkedMultFrame - problem with yearlySiteCovs

1,061 views
Skip to first unread message

Shane Pruett

unread,
Aug 6, 2015, 3:05:45 PM8/6/15
to unmarked

I am trying to run a multiple season, single species occupancy model with covariates using colext in unmarked. I have worked on this for days now and I need to make some progress, so I hope someone here can help. Everything seems to work fine except for the format of yearlySiteCovs. I’ve looked at the help material, the “Dynamic Occupancy” article by Kery and Chandler, and have tried multiple logical workarounds. I also found THIS post which I thought answered it, but I still don’t seem to be able to get it right.

Using standard notation of Kery and Chandler, M = 330, J = 3, and T = 9. I have two siteCovs, dummy vars to indicate membership in one of 3 populations; the siteCov matrix is 330 rows. I have 3 obsCovs applied to detection probability. When formatted, this generates amatrix with MTJ (8910) rows and 3 columns.

 

According to the unmarked documentation, the yearlySiteCovs is a matrix of MT rows in a site-major, year-minor order. Thus I started with a 330 * 9 matrix (sample below) with columns representing each year’s value (months since an event) measured at the beginning of the season for each site (represented by their index# here):

     yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010 yr2011

7       25      2     14      2      1     13     25     12      3

27      15      2     14      2      1     13      3     15      3

44      34      2     14      0     12     24     36     12      5

56      34     46      1      1     13      3     15     27     12

59      13      2     15     26     38     50     11     23     35

68       2     14      1     12     24     36     11     23     35

185     NA     NA     NA     60     10     22     11     23     35

212     10      0     11     23     35     10     22     34     11

238     10     22     10     22     34      9      9     21      1

327      1     14     25      1     12     25     36      0     12

 

I tried various formats to get this to read in and get various reasons why it doesn’t. I used ‘melt’ in reshape2 to convert the data to a 2970 row matrix ordered by site then year and then removed the extraneous information leaving a vector of the above time since event values. If I use

 

umf <- unmarkedMultFrame(y=y,

    siteCovs = siteCovs,

    obsCovs = obsCovs,

    yearlySiteCovs = yearlySiteCovs,

    numPrimary=T)


 I get the error: “Error in checkAtAssignment("unmarkedMultFrame", "yearlySiteCovs", "integer") :

  assignment of an object of class “integer” is not valid for @‘yearlySiteCovs’ in an object of class “unmarkedMultFrame”; is(value, "optionalDataFrame") is not TRUE”

 

If I change the data type to anything else, I get the same message with whatever other class I use inserted. 

 

If I change the input to: yearlySiteCovs = list(yearlySiteCovs)   I get the error: “Error in unmarkedMultFrame(y = y, siteCovs = siteCovs, obsCovs = obsCovs,  :

  At least one element of yearlySiteCovs is not a matrix or data frame.”

 

I am seriously out of time and overdue on this. Can anyone please help me to understand how to format this so I can run the model? The real frustration is that I’ve done the analysis in Program PRESENCE, but I can’t easily generate predictive graphics I need from that output. I need to duplicate my selected best model in R so I can use the power of R to generate predictives for the report. Thanks in advance, and if you need more info I can provide it.


Cheers,

Shane

Jeffrey Royle

unread,
Aug 6, 2015, 3:42:31 PM8/6/15
to unma...@googlegroups.com
hi Shane, I don't know about all you said there but the help page for colext has this example (see below) where it builds an RxT matrix of the yearly site cov.   How come you can't use exactly this format?
regards
andy
 
 
 
# Fake data
R <- 4 # number of sites
J <- 3 # number of secondary sampling occasions
T <- 2 # number of primary periods

y <- matrix(c(
   1,1,0,  0,0,0,
   0,0,0,  0,0,0,
   1,1,1,  1,1,0,
   1,0,1,  0,0,1), nrow=R, ncol=J*T, byrow=TRUE)
y

site.covs <- data.frame(x1=1:4, x2=factor(c('A','B','A','B')))
site.covs

yearly.site.covs <- list(
   year = matrix(c(
      'year1', 'year2',
      'year1', 'year2',
      'year1', 'year2',
      'year1', 'year2'), nrow=R, ncol=T, byrow=TRUE)
      )
yearly.site.covs


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Daniel Hocking

unread,
Aug 6, 2015, 11:15:47 PM8/6/15
to unmarked
Hi Shane,

It seems like it might be a structure formatting issue of yearlySiteCovs. Can you send the code you use for setting up the object and then the results of str(yearlySiteCovs)? Maybe name the object yearly.site.covs just to avoid scoping issues with the name of the argument in the function.

-Dan

Shane Pruett

unread,
Aug 7, 2015, 4:56:36 PM8/7/15
to unmarked
Andy/Dan,

Thanks for the responses. I had a former intern in our lab help me out with some code she had that made a little more sense to me. I certainly wasn't structuring the list for yearlySiteCovs correctly, and though the colext example is technically correct, it was simplified in such a way that I was having a hard time applying to my data. I've been able to now reformat the data to what appears to be correct, but and run models not containing the yearlySiteCovs, but when I attempt to run such a model, I am now getting a "Time since event not found" message. For some reason, even though I have it in the umf object, unmarked isn't seeing it. I saw a post somewhere that there is a bug in yearlySiteCovs that doesn't work with "NA" values in the covariate dataframe. I was working under the assumptions that those were dropped, but maybe not, and that's causing my current issue. I'm going to purge the data of those records and rerun to see what I get. If it runs okay, then I'll have to try and figure out how best to deal with the issue in the overall analysis. I don't want to lose a bunch of data, but obviously I have to think carefully about how imputing mean values, or some other proxy value would influence the outcome. 

I appreciate your thoughts and input and may yet return with more questions. If you have thoughts on the above, I'd love to hear them.

Thanks,
Shane


On Thursday, August 6, 2015 at 3:05:45 PM UTC-4, Shane Pruett wrote:

Daniel Hocking

unread,
Aug 10, 2015, 12:08:46 PM8/10/15
to unmarked
Shane,

Without being able to see and manipulate the data, I'm not sure I can diagnose the current problem. I believe that in the current version of unmarked NA are handled properly (i.e. dropped) in yearlySiteCovs. I tested it with some example data and it seemed to work, so I don't think that's you problem. It seems likely that it's still in the umf formatting or relation between the umf and the model specification. As you note, if you don't want to through out the data with missing covariates you will have to do some sort of imputing which comes with assumptions. You can always test the effect of those assumptions by doing it iteratively with different imputed values as a type of sensitivity analysis.

-Dan Hocking

------------

Shane Pruett

unread,
Aug 17, 2015, 10:05:28 AM8/17/15
to unmarked
I tried to respond to you directly instead of to the broader list, but I can find no evidence on my end that a private email went out to you, so I'm responding to the list instead. See the attached sample data file and annotated script written in Notepad++.

Thanks again to anyone who can suggest something new to try that will resolve my issue.

Shane
SampData.csv
unmarked_SPruett.r

Daniel Hocking

unread,
Aug 18, 2015, 12:04:00 PM8/18/15
to unmarked
Hi Shane,

I replied in detail off this listserv, but I wanted to add it here so others know the question has been answered.

To add the mrf03 into the psi parameter in unmarked, it just has to be added to the site covariates, not just the yearlysitecovs. Basically the way unmarked is coded, each model level (psi, gamma, epsilon, p) can only pull parameters from the covariates in a specific place in the unmarked data frame. So even though "mrf03" was in the yearlySiteCovs part of the unmarkedMultFrame, it is not structured in a way that it can be used in psi or p. Below is how I would write the code. HOWEVER, what you really want is mrf02 or mrf_historical to use in psi as an indicator of the historic fire history that established the current occupancy. Then the mrf03, 04, 05, etc., should represent the change in fire history between closed occurrence periods so that it indicates how fire would affect the colonization or extinction (change the previous occupancy status).


fgsp.umf <- unmarkedMultFrame(y=datin[,2:28],
                              siteCovs=datin[,c("kpsp","tlwma", "mrf03")],        # Note that may need to add column for apafr
                              yearlySiteCovs = list(mrFire=datin[,c("mrf03", "mrf04", "mrf05", "mrf06", "mrf07", "mrf08", "mrf09", "mrf10", "mrf11")]),
                              obsCovs = list(    doy=datin[,41:67],
                                              tod=datin[,68:94],
                                              tod2=datin[,95:121]),
                              numPrimary=9)

m3.1 <- colext(psiformula = ~kpsp + tlwma + mrf03,
             gammaformula = ~kpsp + tlwma + mrFire,
             epsilonformula = ~kpsp + tlwma + mrFire,
             pformula = ~kpsp + tlwma + tod + tod2 + doy, data = fgsp.umf)
summary(m3.1)


-Dan



On Thursday, August 6, 2015 at 3:05:45 PM UTC-4, Shane Pruett wrote:

Shane Pruett

unread,
Aug 18, 2015, 2:14:15 PM8/18/15
to unmarked
Thanks again Dan. Just to close the loop on this from this end...

The mrf03, 04, ...,11 covariates were actually derived in such a way that mrf03 is the most recent fire (number of months) prior to the '03 count, and thus psi ~ mrf03 is correctly specified. I developed them so that the label explicitly tells me what season they are influencing.

Thanks for the help, and hopefully this will assist someone else in the future. Or maybe no one else will be quite as thick as I have been... ;-) 

Shane

Shane Pruett

unread,
Aug 19, 2015, 9:53:56 AM8/19/15
to unmarked

I have one last question regarding how colext() uses the yearlySiteCovs. In my data there are 9 primary periods (2003-2011), thus 8 transitional periods for colonization and extinction. However, the yearlySiteCovs list must contain a variable for each of the nine years, so I end up with the following call to unmarkedMultFrame. As noted above, mrf03 applies to fires prior to the initial year of the study, and thus is included in the siteCovs data. mrf03 really has no place in the yearlySiteCovs data, but appears to be required to give the correct dimensions to the data. 


fgsp.umf <- unmarkedMultFrame(y=datin[,2:28],

                              siteCovs=datin[,c("kpsp","tlwma", "mrf03")],      

                              yearlySiteCovs = list(mrFire=datin[,c("mrf03", "mrf04", "mrf05", "mrf06", "mrf07", "mrf08", "mrf09", "mrf10", "mrf11")]),

                              obsCovs = list(    doy=datin[,41:67],

                                              tod=datin[,68:94],

                                              tod2=datin[,95:121]),

                              numPrimary=9)

 

When I set up a model such that gammaformula = ~kpsp + tlwma + mrFire, which mrFire columns are being supplied to which colonization probability? For gamma1 between year03 and year04, I want to apply mrf04 which contains the data relevant to settling patterns in 2004. If unmarked is applying the mrf03 data, the associations are nonsensical, but if I try to exclude the mrf03 column from the list, I get an error about incorrect dimensions in my list. My concern arises because when I compare results from PRESENCE to my unmarked results, they are essentially identical for all other models I’ve run, but in this case I can’t get them to match, suggesting I’m not getting the proper parameterization. Can anyone enlighten me on this? I did some digging around and I just can't find a good explanation of this.


Thanks,

Shane


Shane Pruett

unread,
Aug 25, 2015, 3:15:36 PM8/25/15
to unmarked

Sorry if my persistence is offputting, but I’m going to ping this thread one more time in hopes someone can help me to understand the situation in my last post above. I’ll constrain the example a bit more than above for brevity, but assume 3 years of data 2003-2005, thus initial Psi in 03, colonization/extinction events 03-04 and 04-05, and detection probabilities for each point count during that time, 3/season.


As Dan described above, I broke out mrf03 (most recent fire prior to the 03 season) for use as a predictor of initial colonization because we believe there is a short window of time where the habitat is suitable after fire. That leaves me with mrf04 and mrf05, which are the times since the most recent fire preceding the 04 and 05 seasons as predictors of colonization and extinction for each of those seasons.


when I set up my unmarkedMultFrame I intuitively wanted to use:     yearlySiteCovs = list(mrFire=data[,c("mrf04", "mrf05")])


which doesn’t work because there must be a frame for each year, thus unmarked requires:      yearlySiteCovs = list(mrFire=data[,c("mrf03", "mrf04", "mrf05")])


So my question is, what is being applied as a predictor of col/ext? Is it one of those columns, or is it the difference between mrf03 and mrf04 for the 04 season? Dan noted, "Then the mrf03, 04, 05, etc., should represent the change in fire history between closed occurrence periods so that it indicates how fire would affect..." This also seemed to be implied by another post on this forum, though I can't relocate it at the moment. It isn't the change in fire history from one year to the next that is biologically relevant, but the absolute time since the last burn, which is what I've tried to supply in the mrfXX variables for each year.



Thanks again for your time and consideration.

Shane


Dan Linden

unread,
Aug 25, 2015, 10:59:12 PM8/25/15
to unmarked
Hi Shane,

The covariates are used as supplied, they are not modified.  With T years and T-1 transition periods, the transition probabilities only require the first T-1 columns of the covariate matrix, but for bookkeeping purposes the covariate matrix needs T columns.  Given your data, you could do something like this:

yearlySiteCovs = list(mrFire=data[,c("mrf04", "mrf05", "mrf06", "mrf07", "mrf08", "mrf09", "mrf10", "mrf11", "mrf11")])

This way the T-1 transition probabilities are using "mrf04" through "mrf11" and the last duplicated column is being ignored.  Technically that last column could be anything.

Shane Pruett

unread,
Aug 26, 2015, 9:42:20 AM8/26/15
to unmarked
Perfect Dan! Thanks for the assist. I felt like it should be a straightforward 1-to-1 relationship, as with pretty much all other likelihood based models I've used, but since I couldn't figure out how to pull the guts out of unmarked and have a look at it, I just couldn't get past this. I ran the full set and no surprise, the models now appear to be acting more in accord with our expectations, I'm having no convergence issues, and the estimates and errors are wholly believable. This is one of those components of unmarked that (to me) doesn't seem intuitive or really well-explained anywhere, so hopefully someone finding this in the future will benefit from your explanation. 

Cheers,
Shane

Shandhya Sharma

unread,
Oct 6, 2017, 5:35:09 AM10/6/17
to unmarked
Thanks I learned many more things form here but I have one problem,
If I have a detection/non detection data of single species, siteCovs data and obsCovs data but i have no data on the yearsiteCovs eventhough I want to calculate multiple season occupany between the year 2012 to 2017. What should be mentioned in yearsiteCovs?

Sandhya Sharma

Jim Baldwin

unread,
Oct 8, 2017, 11:38:23 PM10/8/17
to unma...@googlegroups.com
The answer might be "Don't mention it."  The model won't mind.  As the years become farther away from the initial year the predicted occupancy probability will approach a constant no matter what psi1 happens to be.  For the case where the colonization and extinction parameters are constant (gamma and epsilon, respectively), the occupancy probability will approach gamma/(gamma+epsilon).  All of the years will have slightly different occupancy probabilities as they approach this limit.  So you'll still get different estimates for each year.

However, if there really needs to be yearly site covariates, then that's a different problem.

Jim


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+unsubscribe@googlegroups.com.

sandhya sharma

unread,
Oct 8, 2017, 11:41:32 PM10/8/17
to unma...@googlegroups.com
Thank you so much sir

--
You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/Tk5QRBFTgyY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages