140 views

Skip to first unread message

Sep 1, 2022, 5:32:54 PMSep 1

to unmarked

Hi folks,

I’m running gmultmix() on fish removal data collected
under the robust design (10 primaries, each with 3 secondaries) across two
sites. I have 16 species, and for some of them, a null model will run just
fine, but for others I get this error:

*Error in optim(starts, nll, method = method, hessian =
se, ...) : *

* initial value in 'vmmin' is not finite*

I dug through the posts in this group page and got a few ideas – use starting values, try both mixture types, change the method for the optim function, and set se=FALSE – but none of these resolved the problem.

When I used starting values (I tried a wide range of values from reasonable to totally absurd and transformed them like so: starts=c(log(abundanceGuess), qlogis(availGuess), qlogis(detGuess))), sometimes I still got the previous error, and sometimes I got this error, which I assume means that all my starting values were really bad:

*Error in optim(starts, nll, method = method, hessian =
se, ...) : *

* function cannot be evaluated at initial
parameters*

Now the really interesting thing is that I’ve noticed that the species-specific datasets that produce the error are those that had the highest observed counts and widest range of counts. For example, in the yellow perch dataset (which throws the error), on one secondary occasion nearly 250 individuals were removed but the mean number removed across all occasions was 34, and there were multiple occasions when no perch were removed at all. For species with counts between 0 and about 10, and a mean of 1 to 2, the model will run just fine. Given this pattern I’m wondering if there is something fundamentally challenging for the model about having a dataset with counts that vary dramatically from one occasion to the next, and if perhaps it’s not something that can be resolved with starting values?

Any help, suggestions, etc. would be much appreciated. I haven't included any data or code but am happy to provide that if it would be helpful.

Jillian

Message has been deleted

Oct 6, 2022, 1:48:38 PMOct 6

to unmarked

Since my last post I've gone back to the example script I started with (found here: https://rdrr.io/cran/unmarked/man/gmultmix.html) and experimented with running gmultmix() models using simulated datasets. Basically, the model throws this error:*Error in optim(starts, nll, method = method, hessian = se, ...) : *

initial value in 'vmmin' is not finite*Error: cannot allocate vector of size 14.9 Gb*

initial value in 'vmmin' is not finite

anytime the true abundance (in the data simulation code) is set to >50. This can be resolved by increasing K until I set abundance > 196, at which point increasing K to even absurdly large values (100k, 1 million, 5 million, 50 million, 100 million) doesn't seem to have any impact.

Incidentally, when I tried K values of 100 million, I got this error, so I didn't try anything larger:

I'm using a machine with 64 Gb of RAM, so I'm not sure where the size limit is coming from.

So. Given these patterns, I'm hoping someone out there who better understands the inner workings of unmarked can help me figure out whether there is something that can be done to resolve the problem.

Below is the code for simulating data and running a simple null model that I found at the link above:

library(unmarked)

# Simulate data using the multinomial-Poisson model with a repeated constant-interval removal design.

n <- 100 # number of sites

T <- 4 # number of primary periods

J <- 3 # number of secondary periods

lam <- 100 #### this is what i've been changing the true abundance for the simulated population

phi <- 0.5

p <- 0.3

y <- array(NA, c(n, T, J))

M <- rpois(n, lam) # Local population size

N <- matrix(NA, n, T) # Individuals available for detection

for(i in 1:n) {

N[i,] <- rbinom(T, M[i], phi)

y[i,,1] <- rbinom(T, N[i,], p) # Observe some

Nleft1 <- N[i,] - y[i,,1] # Remove them

y[i,,2] <- rbinom(T, Nleft1, p)

Nleft2 <- Nleft1 - y[i,,2]

y[i,,3] <- rbinom(T, Nleft2, p)

}

y.ijt <- cbind(y[,1,], y[,2,], y[,3,], y[,4,])

# Simulate data using the multinomial-Poisson model with a repeated constant-interval removal design.

n <- 100 # number of sites

T <- 4 # number of primary periods

J <- 3 # number of secondary periods

lam <- 100 #### this is what i've been changing the true abundance for the simulated population

phi <- 0.5

p <- 0.3

y <- array(NA, c(n, T, J))

M <- rpois(n, lam) # Local population size

N <- matrix(NA, n, T) # Individuals available for detection

for(i in 1:n) {

N[i,] <- rbinom(T, M[i], phi)

y[i,,1] <- rbinom(T, N[i,], p) # Observe some

Nleft1 <- N[i,] - y[i,,1] # Remove them

y[i,,2] <- rbinom(T, Nleft1, p)

Nleft2 <- Nleft1 - y[i,,2]

y[i,,3] <- rbinom(T, Nleft2, p)

}

y.ijt <- cbind(y[,1,], y[,2,], y[,3,], y[,4,])

umf1 <- unmarkedFrameGMM(y=y.ijt, numPrimary=T, type="removal")

(m1 <- gmultmix(~1, ~1, ~1, data=umf1, K=1000)) #### this is where I've been changing K

(m1 <- gmultmix(~1, ~1, ~1, data=umf1, K=1000)) #### this is where I've been changing K

Oct 6, 2022, 5:19:55 PMOct 6

to unmarked

Hi Jillian,

Your example code runs fine for me (I tried a few different simulation runs). I'm not sure if you meant that example to work or not work?

K needs to be at least equal to the "known" minimum population size, which in a removal dataset would be the sum of all the removal counts at a site (i.e., you know at least that many animals must have existed). At the moment however the default is to use the max of any single count + 100. If your removal counts are relatively small, this shouldn't be a problem, but if they are occasionally very large, the auto-generated K might not be big enough and lead to optimization issues like you saw. The solution is to manually set K to at least be as big as the max total number you removed from a site plus a big buffer, which it sounds like is what you figured out. The way K is generated by default should be fixed for gmultmix I think.

You don't need to keep making K bigger and bigger, as long as it is "big enough" you should get the same results as you noticed. So just find a value that works and try a few larger values until you are sure the estimates have stabilized. For lam = 100 for example, K=1000 should be plenty high.

As far as your memory issue goes I'm not surprised. If you make K = 100 million, unmarked is generating some really large vectors and matrices to feed into the optimization which take up tons of memory. Also, while I don't know the reasons exactly, it is pretty typical in my experience to get R crashes when you have used much less than your theoretical maximum RAM. But you shouldn't ever have to set K that high (I guess unless you were working with bacteria or something, and if so unmarked is probably not the right tool).

Ken

Oct 6, 2022, 5:34:04 PMOct 6

to unmarked

Hi Ken,

Thanks for taking the time to reply.

To get the model to fail with that simulated data, I just set 'lam' to be 197 or higher, and then it didn't matter what K I chose, from 200 to 2000 to 2 million, it still gave me that error. Now, if you aren't seeing that kind of result, then... I dont' even know, it would have to be some problem with my computer wouldn't it?

If you have time, woudl you let me know if are able to get the error using that value for lam?

Thanks again,

Jillian

Oct 6, 2022, 6:22:54 PMOct 6

to unmarked

OK, I misunderstood. I see what you mean, I'm getting the same error when lam is larger.

I think I have an idea of what is happening. In the function that's used to calculate the log-likelihood, there is one particular calculation that when counts and K are large, the intermediate result can be so large (basically exp(some large number)) that it is simply considered infinite instead. Since anything + infinity is infinity, this results in the output of the whole function being infinite, and you get the 'vmmin is not finite' error.

I *think* if the algorithm used here was structured differently, this might be avoided. I will look into that, but it might take a while.

If Richard sees this maybe he has more comments on using relatively large counts with this function - I think it has mainly been used with e.g. bird point counts where the numbers involved typically stay smaller.

Ken

Oct 6, 2022, 6:26:49 PMOct 6

to unmarked

For future reference this issue occurs for both the R and C++ engines, so it isn't specific to one or the other.

Oct 6, 2022, 7:05:01 PMOct 6

to unmarked

No worries at all, I'm not the best at explaining.

Thanks for looking into it, sounds like I may need to use some other analysis for now. I actually reached out to Richard directly when I first started getting the errors, but he was pretty swamped with other stuff and didn't have time to take it on. It would make sense though, that if the models were designed and tested with point count data, this kind of problem wouldn't have been apparent. My removal study resulted in some occasions where more than 100 individuals were collected at a single sample, and so total removed for several species is over 1000.

Oct 28, 2022, 9:14:19 AMOct 28

to unmarked

Hi Jillian,

Perhaps you've moved on from this, but in case you are still interested I re-wrote the gmultmix likelihood in a way that seems to have fewer issues with infinite values when counts are large. At least, the example works for me when I set lam to something like 200. You can experiment with this version by installing it from github

remotes::install_github("kenkellner/unmarked", ref="gmultmix_refactor")

Ken

Nov 1, 2022, 4:07:17 PMNov 1

to unmarked

I have not moved on, I've been crossing my fingers something magical would happen and you or someone else would have time to work on this problem, so I am very pleased to hear from you!

I will get that version installed and update you on how it's working both with my data and with some more experiments off that base script.

Thank you so much for taking this on!!!

Nov 1, 2022, 5:00:58 PMNov 1

to unmarked

Hi Ken,

I played around with both the base vignette script i sent you before and my own dataset. Your update definitely improves the situation, I can now run the vignette script with lam set as high as 285 before I get the error. If I increase it to 286 or above, then the error gets thrown again.

For my datasets, the data from the two hyper-abundant species still causes the same error as originally.

Total removals for my most abundant species was 2024 individuals, perhaps, if you are willing to keep working on this, it might make sense to try to get the model working with lam set to something like 2500 or even 3000 individuals?

Thanks,

Jillian

Nov 4, 2022, 8:54:17 AMNov 4

to unmarked

Hi Jillian,

Unfortunately I am out of ideas at this point for handling even larger counts. I'll keep thinking about it but it is unlikely to be progress on this anytime soon, sorry!

Ken

Nov 4, 2022, 4:41:23 PMNov 4

to unmarked

Gotcha, I know sometimes this stuff can only be pushed so far.

I wonder if it would be appropriate to artificially reduce the removal numbers, say by dividing each value by 2 or something, run the model, and then expand the resulting abundance estimates by the same factor the original data was reduced? I'd probably leave counts of 1 as they are, since a half a fish rounds up to one fish anyway.

Does that seem totally crazy?

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu