Problem with the dynamic occupancy models script

511 views
Skip to first unread message

Tomaz N. Melo

unread,
Feb 10, 2021, 9:26:38 PM2/10/21
to unmarked
Dear unmarked users,

I'm starting to use unmarked now, and I'm still learning. I am trying to make some dynamic occupancy models. I have a data frame with 89 days of sampling, 19 sites, and three covariates. The presence and absence data were collected in 4 samples, and when I try to create an object with use as Yearly site covariates with the command:

seasons = matrix (c ("1", "2", "3", "4"), 19, 4, byrow = TRUE)

When I will create the object with the command:

umf = unmarkedMultFrame (y = y, siteCovs = bird.sitecov, yearlySiteCovs = list (year = seasons), numPrimary = 4)

The following message appears:
Warning message:
yearlySiteCovs contains characters. Converting them to factors.

Does anyone know how I can help to solve this problem?

I apologize if this subject has already been discussed on another topic. I hope to learn a lot from you all.

Jeffrey Royle

unread,
Feb 10, 2021, 9:40:14 PM2/10/21
to unma...@googlegroups.com
hi Tomaz,
 That warning you have might not lead to any error in the fitting of models. You should try to fit a model. 
 In any case, I ran the commands in your email and did not get the warning message (I had to make up a y matrix and delete the siteCovs...).  What version of unmarked are you using?
regards
andy


--
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/4f12c571-8142-496f-80fb-c6516f3495e4n%40googlegroups.com.

Alistair Stewart

unread,
Feb 11, 2021, 1:22:13 AM2/11/21
to unma...@googlegroups.com
Hi Tomaz,

When you fill the matrix with the list of numbers you use quotes which coerces the values into characters and unmarked reads characters as factors. Remove the quotes and the years will be kept as numeric values:
seasons =matrix(c(1,2,3,4),19,4,byrow=TRUE)

As Jeffrey notes this might not lead to any error or difference fitting the models. 

Regards,
Alistair

Ken Kellner

unread,
Feb 11, 2021, 7:37:54 AM2/11/21
to unmarked
To add a bit more context:

In older versions of R (<4), variables that were character strings (like "seasons") were automatically converted to factors when added to a data frame. In R version 4 or greater, the R team changed this default so now characters were no longer converted to factors automatically. Unmarked still needs them to be factors, so unmarked now does the conversion itself and warns you just in case you didn't mean for it to happen.

In your analysis, is season supposed to be a categorical variable (I suspect it is)? If so leave the quotes and ignore the warning. Is season supposed to be a numeric/continuous variable? Then remove the quotes as Alistair suggests and you should no longer get a warning. Either way the model should run, but you will not get the same results both ways.

Ken

Tomaz N. Melo

unread,
Feb 11, 2021, 9:36:14 AM2/11/21
to unmarked
Thank you all for your help.

Jeffrey, the version of the unmarked I'm using is 1.0.1 and my version of R is 4.0.3.

Alistair and Ken the season variable is categorical. I tried fitting a model and I couldn't. Here are all the commands I used:

library (unmarked)
bird = read.table ("mazaria_propinqua.csv", na.strings = "NA", stringsAsFactors = FALSE, strip.white = TRUE, header = TRUE)
y <-bird [, 2: 90]
bird.site <-bird [, 91: 93]
bird.sitecov = data.frame (bird.site)

seasons = matrix (c ("1", "2", "3", "4"), 19, 4, byrow = TRUE)
umf = unmarkedMultFrame (y = y, siteCovs = bird.sitecov, yearlySiteCovs = list (year = seasons), numPrimary = 4)

When I tried fitting a model I received the error message:
fm0 = colext (~ 1, ~ 1, ~ 1, ~ 1, umf)
Error in data.frame (..., check.names = FALSE):
   arguments imply differing number of rows: 1691, 1672

Thanks again for your help

Ken Kellner

unread,
Feb 11, 2021, 10:36:56 AM2/11/21
to unmarked
It's hard to test without the actual data in hand, but I think the problem is with your "y". You define it as 

y <- bird[,2:90]

which will result in a matrix with 89 columns ( = 89 observations per site total). However, you also say you have numPrimary = 4. There should be the same number of observations in each primary period, but right now there isn't (89/4=22.25).
unmarked is getting confused by this uneven number of obs and giving a not-very-informative error message, which should be fixed so it is more helpful.

Perhaps this is a typo in your code and it is supposed to be, say,  y = bird[,2:89] so that you have 88 total observations (88/4 = even 22 per primary period).

Or maybe you actually have a different number of observations in one of your primary periods (e.g. one year had 23 but all the others had 22). This is fine, but you have to pad out your y matrix with appropriately placed columns of NAs so that all primary periods appear to have the same number of observations.

Ken

Tomaz N. Melo

unread,
Feb 11, 2021, 9:55:59 PM2/11/21
to unmarked
Hi Ken,

You were right. The problem was really was that I had a different number of days between seasons. I deleted a column from one of the periods that was over and I was able to run all my models.

I thank you all so much for your help. I will follow the posts here in the group. I want to learn everything about the unmarked.

Tomaz

Kery Marc

unread,
Feb 12, 2021, 3:57:56 AM2/12/21
to unma...@googlegroups.com

".... I want to learn everything about the unmarked."

Haha, excellent ! Go for it !!! And afterwards you should learn everything about Bayesian model fitting engines ;)

Best regards  --- Marc


From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Tomaz N. Melo [tomazram...@gmail.com]
Sent: Friday, February 12, 2021 03:55
To: unmarked
Subject: Re: [unmarked] Problem with the dynamic occupancy models script

Tomaz N. Melo

unread,
Feb 12, 2021, 8:37:10 AM2/12/21
to unma...@googlegroups.com
For sure Marc. Do not be surprised if in the near future I write with doubts about this.

All the best guys,

Tomaz Nascimento de Melo
PhD candidate - UFAM
MSc. Ecology and Natural Resources Management - UFAC



You received this message because you are subscribed to a topic in the Google Groups "unmarked" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/unmarked/39AbZFipSPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/F801018864E940468323F405DF95D2DD7634646A%40MailNT.vogelwarte.ch.

Tomaz N. Melo

unread,
Feb 12, 2021, 1:16:42 PM2/12/21
to unmarked
Hello guys

I need one more help.

I want to illustrate on a graph the relationship between occupancy and one of my covariates.
I used the command below:

na <- data.frame (Tessaria = seq (-0.3724, 2.9979, length = 100)) #minimun and maximun values
E.psi <- predict (fm9, type = "psi", newdata = nd, appendData = TRUE)
E.psi
with (E.psi, {
   plot (Tessaria, Predicted, ylim = c (0.1), type = "l",
        xlab = "Tessaria",
        ylab = expression (hat (psi)), cex.lab = 0.8, cex.axis = 0.8)
   lines (Tessaria, Predicted + 1.96 * SE, col = gray (0.7))
   lines (Tessaria, Predicted-1.96 * SE, col = gray (0.7))})

But the graph is getting weird, as I normalized my covariates to make occupancy models unmarked.
Figure1.jpg

Does anyone know how I can improve the presentation of this information?

Kery Marc

unread,
Feb 12, 2021, 2:21:56 PM2/12/21
to unma...@googlegroups.com
Dear Tomaz,

these predictions are on a link-transformed scale, so for the CIs, you have to do plogis(Predicted +/- 1.96 * SE).

And then, in your code, in the predict function you should have newdata = na, I would say.

Does this help ?

Best regards  -- Marc

Ken Kellner

unread,
Feb 12, 2021, 3:01:14 PM2/12/21
to unmarked
I think there potentially a few issues to deal with.

I agree with Marc that it should be newdata = na, not newdata=nd, based on your code.

unmarked's predict() returns estimates on the response scale (in this case, occupancy probability) by default. You shouldn't use the SE you get in the predict() output to make the confidence interval as you've done here, since that can result in values <0 or >1. Instead use the 'lower' and 'upper' columns that you'll also find in the output data frame. These represent the lower and upper CI bounds (95% CI by default). You can change this to some other interval by specifying the "level" argument, e.g.

predict(fm9, type='psi', newdata=na, level=0.9)

and then use the resulting lower and and upper columns as a 90% CI.

Also, if you've normalized your covariates before putting them into an unmarkedFrame, you'll need to provide newdata values to predict() that have been normalized in the same way (i.e., subtract your sequence of values by the original mean of the variable and divide by original sd). Based on your figure I am guessing you haven't done that since predicted occupancy spikes so quickly to 1.

Ken

Tomaz N. Melo

unread,
Feb 12, 2021, 4:37:41 PM2/12/21
to unmarked
Hi Marc and Ken

I normalized the covariates before putting them into an unmarkedFrame with Z score.
These are the original values:
Tessaria    
0

0

0

0

0

0

0

0

0

0

1

0

0

0

0

2

64

256

292

And these after normalizing:
Tessaria
-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3608359

-0.3723778

-0.3723778

-0.3723778

-0.3723778

-0.3723778

0.3663031

2.5823461

2.9978541

"Also, if you've normalized your covariates before putting them into an unmarkedFrame, you'll need to provide newdata values to predict () that have been normalized in the same way (ie, subtract your sequence of values by the original mean of the variable and divide by original sd)."
Do I have to do this procedure with each of the normalized values?

Ken Kellner

unread,
Feb 12, 2021, 4:54:42 PM2/12/21
to unmarked
I was wrong, you did use the normalized values for your newdata sequence which was correct. It seems the covariate is just very skewed, with those very high values at the end.

Tomaz N. Melo

unread,
Feb 12, 2021, 6:35:47 PM2/12/21
to unmarked
Yes, this particular covariate is important for the occurrence of some species, but it is abundant in a few of my sampling sites. The other two covariates are better distributed on the sites.

Lain E. Pardo

unread,
Feb 13, 2021, 4:03:32 AM2/13/21
to unma...@googlegroups.com
Dear Ken,

sorry to jump in, just got a bit confused with a part of your answer here and want to confirm. I understand that we "... need to provide newdata values to predict() that have been normalized in the same way ...". However, usually, this scale is not very interesting to see the effect or making decisions. So, I thought it was correct to build a graph in the original scale just by providing a sequence of values in the original scale and plot it using the predicted values, is this wrong? It's probably more a visualization thing, but I´m now wondering. Should I predict() again with original values in order to do this?

Please see a commented example below in case it is useful for your feedback

Thank you very much.

Lain
----------

newData1  <-  data.frame(Tree.Cover.Cam = seq(-0.7123054, 4.6376593, by = 0.1))   #normalized using scale()                                                                                                                                    
pred1 <- predict(psi_just_tree,  type  =  "state",  newdata  =  newData1,  appendData  =  TRUE)
pred1

#plot normalized data
f <- ggplot(pred1,aes(Tree.Cover.Cam, Predicted, ymin=lower,ymax=upper))
f + geom_line(colour = "blue") +
  geom_ribbon(alpha=0.2,colour=NA) +
  labs(x = "Density of trees(std)", y = "prob. occ") # graph showing that relationship with x being in the scaled covariate 

# on the original scale

range(site_covs1$Tree.Cover.Cam) # 0.0-38.88 (original values of covariate)
tree.raw <- seq(0, 38.8, length= 54) # for the graph

#plot

fa <- ggplot(pred1,aes(tree.raw, Predicted, ymin=lower,ymax=upper))
fa + geom_line(colour = "blue") +
  geom_ribbon(aes(x =tree.raw, ymin = lower, ymax = upper), alpha = 0.2)+
  labs(x = "Density_tree(original)", y = " pred occ (original)") + 
 theme(axis.text=element_text(size=15),axis.line=element_line(colour="black"))



--

Postdoctoral Research Fellow

Wildlife Ecology Lab,

School of Natural Resource Management

Nelson Mandela University

George Campus, South Africa

Mobile: +27-0762599325

Lain....@Mandela.ac.za

Kery Marc

unread,
Feb 13, 2021, 5:58:56 AM2/13/21
to unma...@googlegroups.com
Hi Lain,

I am not sure your solution is correct. You want to take 'tree.raw' as a new covariate that is used for prediction and so you choose it to cover the entire range of the observed covariate in an even way.

What you must do then is first to scale 'tree.raw' with the mean and the standard deviation that was used by scale() when you scaled the original covariate before using it in the fitted model. So, you do something like:

tree.pred <- (tree.raw - mean.original.tree.data) / sd.original.tree.data

And then you supply 'tree.pred' in the predict function. Only now can you plot the prediction against 'tree.raw'.

Best regards  --- Marc





From: unma...@googlegroups.com [unma...@googlegroups.com] on behalf of Lain E. Pardo [lepa...@gmail.com]
Sent: Saturday, February 13, 2021 10:03
To: unma...@googlegroups.com

Ken Kellner

unread,
Feb 13, 2021, 9:08:39 AM2/13/21
to unmarked
First, to your main question, I agree that in most cases you would want to plot the covariate on its original scale.

Lain, I share Marc's concern about your approach here. I think it *could* be correct, it's hard to tell just from the code you provided. The key point is: do the min and max of the scaled sequence (in your newdata) correspond to transformed versions of the min and max of the unscaled data, and are the two sequences the same length and evenly spaced? If so you are probably OK, but it's not clear from your code. For example, is length(newdata$Tree.Cover.Cam) 54 to match the length of tree.raw? This is kind of hard to sort out in words so I wrote up an example of possible ways to do it, and end with an example of the way I think you should do it.

library(unmarked)

# Simulate data
set.seed(123)
x <- runif(100,0,10)
psi <- plogis(-0.5 + 0.2 * x)
z <- rbinom(100, 1, psi)

y <- matrix(NA, nrow=100, ncol=5)
for (i in 1:100){
  y[i,] <- rbinom(5, 1, 0.5*z[i])
}

# normalize x first
mn_x <- mean(x); sd_x <- sd(x)
x_scale <- (x - mn_x)/sd_x
# Fit model
umf1 <- unmarkedFrameOccu(y=y, siteCovs=data.frame(x=x_scale))
fm1 <- occu(~1~x, umf1)

Now to predict. I'll show  two correct approaches and one incorrect approach to predicting with a scaled covariate.

# create x sequence on original scale for plot
x_seq <- seq(min(x), max(x), length.out=100)

# Approach 1: 
# normalize the x sequence by scaling by original mean/sd
# this is what Marc suggested
x_scale_seq1 <- (x_seq - mn_x)/sd_x
pr1 <- predict(fm1, "state", newdata=data.frame(x=x_scale_seq1))

# Approach 2: 
# get a sequence between min and max of *scaled* values
# remember we generated x_scale above when making the umf
# this is what Lain did I think
x_scale_seq2 <- seq(min(x_scale), max(x_scale), length.out=100)
pr2 <- predict(fm1, "state", newdata=data.frame(x=x_scale_seq2))

# Approach 3 (INCORRECT): 
# Predict directly with original, untransformed x sequence
pr_bad <- predict(fm1, "state", newdata=data.frame(x=x_seq))

Finally, I'll generate plots for each of the 3 approaches, plus a plot where we put the original X on the x-axis.
 
# Make plot
png("fig1.png")
par(mfrow=c(2,2))

# Plot with x sequence 1
plot(x_scale_seq1, pr1$Predicted, type="line", main="predict with\nx_scale_seq1", ylab="psi")
# Plot with x sequence 2
plot(x_scale_seq2, pr2$Predicted, type="line", main="predict with\nx_scale_seq2", ylab="psi")
# Incorrect plot
plot(x_seq, pr_bad$Predicted, type="line", ylab="psi", main="incorrect: predict\nusing original x")
# Plot with x sequence 1 but put real x on x axis
plot(x_seq, pr1$Predicted, type='line', main="predict with\nx_scale_seq1\noriginal on x-axis", ylab="psi")
dev.off()
fig1.png
As you can see as long as you do it correctly the two approaches are equivalent (and the incorrect approach gives a different plot).

However, there is (I think) and easier way to do this: don't scale x yourself, tell unmarked to do it by putting a call to scale() in your formula.

# Put original x in unmarked frame
umf2 <- unmarkedFrameOccu(y=y, siteCovs=data.frame(x=x))

# Scale x in the formula and fit model again
fm2 <- occu(~1~scale(x), umf2)

The estimates from fm1 and fm2 will be identical. Now you can just use the original x sequence directly to predict, unmarked scales x automatically based on original mean and sd.

# Use original x_seq
pr3 <- predict(fm2, "state", newdata=data.frame(x=x_seq))

# Plot and compare with previous approach to show they are the same
png("fig2.png", width=600, height=480)
par(mfrow=c(1,2))

# our new plot
plot(x_seq, pr3$Predicted, type='line', main="using scale() in formula", ylab="psi")

# our original plot
plot(x_seq, pr2$Predicted, type="line", main="predict with\nx_scale_seq1\noriginal on x-axis",
     ylab="psi")
dev.off()

fig2.png

Lain E. Pardo

unread,
Feb 13, 2021, 9:29:10 AM2/13/21
to unma...@googlegroups.com
Hi Marc,

Thanks much for your quick response (on Saturday!). 

Ok, but that is basically going back to the variable that I used in the fitted model, isn´t it? which is my case is pred1 and plot f. Maybe, I wasn´t clear, in my example Tree.Cover.Cam (newData1) is the scaled covariate of tree.raw (or original) which range is between -0.7123054, 4.6376593  (the original values or tree.raw here range from 0 to 38.8). In some cases, having a scaled value in the x axis is not very informative/intuitive, so I just thought, what if we put the original values (before scale them) and use them as the range of values in the figure. I am not really predicting over new values in the end, as I still use the same estimates of pred1 in the ggplot. Since these scaled values represent the original variable range, I didn´t think it would be a problem, but that´s my doubt now. 

Sorry if I wasn´t clear.
Thanks
L

---
this is not a very good example as the x scale reads very similar in both cases, but maybe helpful to ilustrate my point. left = normalized values, right "on original scale" (this basically removes the negatives numbers)

image.png

Lain E. Pardo

unread,
Feb 13, 2021, 9:57:21 AM2/13/21
to unma...@googlegroups.com
Dear Ken,

thanks much for this comprehensive response. I was responding to Marc as this message arrived. Ok, I understand completely  (my answer is yes to your questions)!

I am so grateful that just an aside question turned into a great explanation.

I (and the entire list for sure) really appreciate your time and feedback.

Have a great weekend.
L

Matt Weldy

unread,
Feb 15, 2021, 12:12:03 PM2/15/21
to unma...@googlegroups.com
I think you just have a typo there. Your storing your new data in na, then your predicting using nd.

Tomaz N. Melo

unread,
Feb 15, 2021, 7:51:47 PM2/15/21
to unmarked
Thank you so much guys. I will try to run the scripts following your suggestions. I write again to say if it worked.
Reply all
Reply to author
Forward
0 new messages