Differences between Maxent 3.3.3k ("black box") and 3.4.0 (open source)

Adam Smith

unread,

Apr 3, 2017, 7:33:05 PM4/3/17

to Maxent

I'm excited about the new open-source version of Maxent. However, I find the output to the older Maxent incomparable. I would expect them to be very slightly different because numerically the inhomogeneous Poisson process solution sets the weight of background sites equal to 100x the weight of presence sites (versus infinite weight as they should be in theory), but the results are too different for this to be the reason for the difference.

Here, comparing the two in R:

# setup
library(maxnet)
library(dismo)

data(bradypus)

data <- bradypus[ , -1]
p <- bradypus$presence

# train models... forcing maxnet() to use all features as these are default in maxent()
netModel <- maxnet(p=as.vector(p), data=data, f=maxnet.formula(p=p, data=data, classes='lpqht'))
mxModel <- maxent(p=as.vector(p), x=data)

# predict using raw output
netPredRaw <- c(predict(netModel, data[1:20, ], type='exponential'))
mxPredRaw <- predict(mxModel, data[1:20, ], args='outputformat=raw')

# they're different!
netPredRaw
mxPredRaw
plot(mxPredRaw, netPredRaw)

# predict using logistic output
netPredLog <- c(predict(netModel, data[1:20, ], type='logistic'))
mxPredLog <- predict(mxModel, data[1:20, ], args='outputformat=logistic')

# they're still different!
netPredLog
mxPredLog
plot(mxPredLog, netPredLog)

# compare "manual" logistic output of maxnet() (see p. 7 of new tutorial for maxnet()) to logistic output from predict.maxnet()... very similar but not exactly (rounding error?)
netPredRawManual <- (exp(netModel$entropy) * netPredRaw) / (1 + exp(netModel$entropy) + netPredRaw)
netPredRawManual
netPredRaw

So... is maxnet() comparable to maxent()?

Adam

Adam B. Smith
Assistant Scientist in Global Change
Missouri Botanical Garden

Adam Smith

unread,

Apr 4, 2017, 10:30:48 AM4/4/17

to Maxent

Just realized the models would likely be different if using hinge/threshold features because of the way the new version handles them differently. However, if we use just linear features there are still large differences. Here is the same code but with only linear features enabled for both models.

# setup
library(maxnet)
library(dismo)

data(bradypus)

data <- bradypus[ , 2:3] # just using these two predictors to make it simple

p <- bradypus$presence

# train models... forcing maxnet() to use all features as these are default in maxent()

netModel <- maxnet(p=as.vector(p), data=data, f=maxnet.formula(p=p, data=data, classes='l'))
mxModel <- maxent(p=as.vector(p), x=data, args=c('linear=true', 'quadratic=false', 'product=false', 'threshold=false', 'hinge=false'))

# predict using raw output
netPredRaw <- c(predict(netModel, data[1:20, ], type='exponential'))
mxPredRaw <- predict(mxModel, data[1:20, ], args='outputformat=raw')

# they're different!
netPredRaw
mxPredRaw
plot(mxPredRaw, netPredRaw)

# entopies are different
netModel$entropy
mxModel@lambdas[length(mxModel@lambdas)]

# predict using logistic output
netPredLog <- c(predict(netModel, data[1:20, ], type='logistic'))
mxPredLog <- predict(mxModel, data[1:20, ], args='outputformat=logistic')

# they're different!

Jamie M. Kass

unread,

Apr 4, 2017, 10:16:36 PM4/4/17

to Maxent

Thanks for the thorough analysis, Adam. Yes, the outputs are quite different. Some other things I noticed after running the same analysis and exploring a few more things:

(run Adam's code before running mine)

1) The "exponential" output seems to be the values before they are divided by the sum of all point values. If you sum them, you don't get 1, but a very high number.

sum(c(predict(netModel, data, type='exponential'))) # this gives you 473709920

sum(predict(mxModel, data, args='outputformat=raw')) # this gives you 1

2) There is a wide discrepancy in the number and types of parameters used in each model. The maxnet model has more parameters, but many are very close to zero (closer than most in the maxent.jar model).

netModel$betas
length(netModel$betas) # 78 parameters with non-zero betas for maxnet

# a little function I wrote that makes the lambdas vector in dismo into a data frame for easier querying
lambdasDF <- function(mx) {
lambdas <- mx@lambdas[1:(length(mx@lambdas)-4)]
data.frame(var=sapply(lambdas, FUN=function(x) strsplit(x, ',')[[1]][1]),
             coef=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][2])),
             min=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][3])),
             max=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][4])),
             row.names=1:length(lambdas))
}

library(dplyr)
lambdasDF(mxModel) %>% filter(coef != 0)
nrow(lambdasDF(mxModel) %>% filter(coef != 0)) # 63 parameters with non-zero betas for maxent.jar

3) The response curves show more or less the same directionality, but differ considerably sometimes. This is not that surprising, since the parameters are different. Zoom out on these plots to see them better -- they get scrunched up in RStudio.

plot(netModel, vars=names(data), type='exponential') # response curves for maxnet

response(mxModel) # response curves for maxent.jar

What does all this mean? There are certainly differences, not only in the parameters kept in each model (i.e. the parameters that did not drop out via regularization), but also in the predicted quantities. If just the parameters were different and both models came to a similar predicted quantity, that would be understandable. But that's not what's going on, unless we're doing something wrong. It would be great to understand why.

Jamie Kass
PhD Candidate
City College of NYC

Pascal Title

unread,

Apr 5, 2017, 12:14:18 AM4/5/17

to Maxent

It looks like the maxnet function calculates the raw output internally in order to calculate various parameters, such as entropy and alpha. This raw output does sum to 1, and is also calculated across the pseudo-absences only (does not include presences).

library(maxnet)

data(bradypus) 
data <- bradypus[ , -1] 
p <- bradypus$presence

# train models... forcing maxnet() to use all features as these are default in maxent()


netModel <- maxnet(p=as.vector(p), data=data, f=maxnet.formula(p=p, data=data, classes='lpqht')) 
raw.output <- predict(netModel, data, type='exponential') 


# from maxnet() code: 
# define the input arguments 
p <- as.vector(p) 
data <- data 
f=maxnet.formula(p=p, data=data, classes='lpqht')  
regmult=1.0 
regfun=maxnet.default.regularization 

# some of the internal code of the maxnet function
mm <- model.matrix(f, data) 
reg <- regfun(p,mm) * regmult 
weights <- p+(1-p)*100 
glmnet::glmnet.control(pmin=1.0e-8, fdev=0)   
model <- glmnet::glmnet(x=mm, y=as.factor(p), family="binomial", standardize=F, penalty.factor=reg, lambda=10^(seq(4,0,length.out=200))*sum(reg)/length(reg)*sum(p)/sum(weights), weights=weights) 
class(model) <- c("maxnet", class(model))
if (length(model$beta) < 200) stop("Error: glmnet failed to complete regularization path") 
bb <- model$beta[,200] 
model$betas <- bb[bb!=0] 
model$alpha <- 0 
rr <- maxnet:::predict.maxnet(model, data[p==0, , drop = FALSE], type="exponent", clamp=F) 
raw <- rr / sum(rr) 

# raw format produced internally by maxnet() for calculation of parameters 
# this version of raw format is calculated from pseudoabence data only (p==0)
 # and is normalized.
head(raw) 

# raw/exponential format produced by default behavior of predict.maxent() 
head(raw.output)

So these two outputs are different for a number of reasons: the internal one is normalized, and it is based on only pseudo-absences.

Not sure if this helps, but it at least makes it more clear why some things don't match...

Adam Smith

unread,

Apr 5, 2017, 12:24:02 PM4/5/17

to Maxent

With the help of both your great minds I think I (partly) figured it out--though there are still some important differences between the versions.

maxnet() automatically removes presences from the data before calculating raw output (thanks, Pascal!), whereas the default in maxent() is to keep them in. So if we re-write the code setting addsamplestobackground to "false" for maxent() then the results are *very* similar. I would put this down to rounding error or maybe the approximation used in maxnet() to solve the inhomogenous Poisson process where the value of 100 is used to approximate infinity (the next time someone owes me a dollar I'm going to ask for infinity cents return since they're approximately the same). So here's revised code. You can see the results vary between the two models by some very small amount (but see thereafter for remaining differences).

# setup
library(maxnet)
library(dismo)

data(bradypus)

data <- bradypus[ , 2:3] # just using these two predictors to make it simple

p <- bradypus$presence

# train models

# using just linear, product, and quadratic features as hinge and threshold features are calcualted differently between the two versions
# note: using addsamplestobackground=false in maxent() because this is default in maxnet()
netModel <- maxnet(p=as.vector(p), data=data, f=maxnet.formula(p=p, data=data, classes='lpq'))
mxModel <- maxent(p=as.vector(p), x=data, args=c('linear=true', 'quadratic=true', 'product=true', 'threshold=false', 'hinge=false', 'addsamplestobackground=false'))

# predict using raw output

netPredRaw <- c(predict(netModel, data, type='exponential'))
mxPredRaw <- predict(mxModel, data, args='outputformat=raw')

# they're now slightly different!
head(netPredRaw, 20)
head(mxPredRaw, 20)
par(mfrow=c(1, 2))
plot(mxPredRaw, netPredRaw, xlim=c(0, max(netPredRaw, mxPredRaw)), ylim=c(0, max(netPredRaw, mxPredRaw)), , main='ver 3.4.0 vs ver 3.3.3k')
abline(a=0, b=1)
plot(netPredRaw - mxPredRaw, main='ver 3.4.0 minus ver 3.3.3k')

# does sum of "raw" output equal 1? (calculating across background sites only)
sum(netPredRaw[p == 0])
sum(mxPredRaw[p == 0])

# entopies are slightly different
netModel$entropy
mxModel@lambdas[length(mxModel@lambdas)]

# predict using logistic output

netPredLog <- c(predict(netModel, data, type='logistic'))
mxPredLog <- predict(mxModel, data, args='outputformat=logistic')

# they're slightly different!
head(netPredLog, 20)
head(mxPredLog, 20)
par(mfrow=c(1, 2))
plot(netPredLog, mxPredLog, xlim=c(0, max(netPredLog, mxPredLog)), ylim=c(0, max(netPredLog, mxPredLog)), , main='ver 3.4.0 vs ver 3.3.3k')
abline(a=0, b=1)
plot(netPredLog - mxPredLog, main='ver 3.4.0 minus ver 3.3.3k')

# compare "manual" logistic output of maxnet() (see p. 7 of new tutorial for maxnet()) to logistic output from predict.maxnet()... very similar but not exactly (rounding error?)
netPredRawManual <- (exp(netModel$entropy) * netPredRaw) / (1 + exp(netModel$entropy) + netPredRaw)

windows()
plot(netPredRawManual - netPredRaw)

## Note, though that the coefficients are still different between the models (using Jamie's code--thanks!):

# Jamie: a little function I wrote that makes the lambdas vector in dismo into a data frame for easier querying

lambdasDF <- function(mx) {
lambdas <- mx@lambdas[1:(length(mx@lambdas)-4)]
data.frame(var=sapply(lambdas, FUN=function(x) strsplit(x, ',')[[1]][1]),
             coef=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][2])),
             min=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][3])),
             max=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][4])),
             row.names=1:length(lambdas))
}

library(dplyr)
lambdasDF(mxModel) %>% filter(coef != 0)

netModel$betas
length(netModel$betas)

## I'm guessing the difference is because the maxnet() coefficients are on the scale of the predictors (like using predict() on a glm object but not specifying predict(~~~~, type='response')). So maybe this wouldn't be bothersome, except that the response functions are more than slightly different:

plot(netModel, vars=names(data), type='logistic')
windows()
response(mxModel)

Which is the "true" Maxent?
Adam

Pascal Title

unread,

Apr 5, 2017, 12:49:10 PM4/5/17

to Maxent

One part that confuses me is whether or not the raw output from maxnet sums to one and whether or not we should be expected to normalize the exponential output. Adam, you showed that if you calculate across background points only, sum(netPredRaw[p == 0]) == 1. I get that too when I run through your demonstration code.

But how can this be true? In the internal code of maxnet(), on lines 17-18:

rr <- predict.maxnet(model, data[p==0, , drop = FALSE], type="exponent", clamp=F)


raw <- rr / sum(rr)

But in your demonstration, you don't explicitly normalize the exponential output, yet you also get values that sum to 1.

Adam Smith

unread,

Apr 6, 2017, 12:04:07 PM4/6/17

to Maxent

Hi Pascal,
I agree--it seems incongruous. I can't see why the two operations give you the same result. I'll keep the thinker thinking... <smoke>
Adam

Jamie M. Kass

unread,

Apr 6, 2017, 5:09:53 PM4/6/17

to Maxent

Adam,

I checked out your latest code, and I then tried it again with more variables (all of them). I no longer get congruent results -- they are wildly different. It looks like it's because some exponential predictions are extremely high (and standardizing by dividing by the sum doesn't change the relationship). I don't understand why the maxnet exponential prediction does this sometimes. And it looks like, at least for your simple example, "addsamplestobackground=true" still results in relatively similar predictions. Code below:

# same as Adam's code
data(bradypus)
p <- bradypus[,1]
data <- bradypus[,2:3]

netLQP <- maxnet(p=p, data=data, f=maxnet.formula(p=p, data=data, classes='lqp'))
mxLQP <- maxent(p=p, x=data, args=c('nohinge', 'nothreshold'))

netLQP.p.exp <- predict(netLQP, data, type='exponential')
mxLQP.p.raw <- predict(mxLQP, data, args='raw')

plot(netLQP.p.exp, mxLQP.p.raw)  # mostly similar

# all predictors
data <- bradypus[,-1]
data$ecoreg <- factor(data$ecoreg)

netLQP.all <- maxnet(p=p, data=data, f=maxnet.formula(p=p, data=data, classes='lqp'))
mxLQP.all <- maxent(p=p, x=data, args=c('nohinge', 'nothreshold'), factors='ecoreg')

netLQP.all.p.exp <- predict(netLQP.all, data, type='exponential')
mxLQP.all.p.raw <- predict(mxLQP.all, data, args='raw')

plot(netLQP.all.p.exp, mxLQP.all.p.raw)  # pretty different

# something else bothersome is that the cloglog predictions are quite different
netLQP.all.p.cll <- predict(netLQP.all, data, type='cloglog')
mxLQP.all.p.cll <- predict(mxLQP.all, data, args='cloglog')
plot(netLQP.all.p.cll, mxLQP.all.p.cll)

I have more code to show that raster predictions, when 10,000 background points are used (instead of the paltry 1000 used in the bradypus example) and WorldClim variables, maxnet and dismo maxent look pretty similar, and their D values are above 0.9. I'll put it up later.

Jamie

Jamie M. Kass

unread,

Apr 7, 2017, 3:42:03 AM4/7/17

to Maxent

Here's the code I mentioned that plots raster predictions side-by-side so we can see visual differences (for logistic and cloglog predictions). It also calculates Schoener's D so we can see how they differ cell-by-cell. And if you were interested, I have a function in there that makes a raster prediction from a maxnet object, in a similar way you'd do it in dismo.

The bottom line is that when the background is sampled (more or less) adequately and there is more signal than noise, the maxnet and maxent models seem to mostly converge. I can't verify this completely, but here's some evidence. Still not entirely sure why the bradypus dataset predictions are so different.

# load packages
library(maxnet)
library(dismo)
library(raster)
library(dplyr)

# load in WorldClim bioclim variables at 10 arcmin
setwd('/Users/musasabi/Documents/research/maxnet_stuff')
envs <- getData('worldclim', var='bio', res=10)
# crop to Central / South America (approx.)
envs <- crop(envs, c(-95,-25,-55,20))

# load in Bradypus variegatus occurrence data from 
# http://biodiversityinformatics.amnh.org/open_source/maxent/
occs <- read.csv('samples/alltrain.csv')
occs <- occs %>% filter(species=='bradypus_variegatus') %>% select(lon=dd.long, lat=dd.lat)

# draw 10,000 background points from this extent
bg <- randomPoints(envs, n=10000)
colnames(bg) <- c('lon', 'lat')
# bind occurrences and background together
allpts <- rbind(occs, bg)
# get bioclim values for all points
data <- as.data.frame(extract(envs, allpts))
# generate vector of 1's for presence and 0's for background
p <- c(rep(1, nrow(occs)), rep(0, nrow(bg)))

# linear + quadratic models
netLQ <- maxnet(p=p, data=data, f=maxnet.formula(p=p, data=data, classes='lq'))
mxLQ <- maxent(p=p, x=data, args=c('nohinge', 'nothreshold', 'noproduct'))

# function for making a data frame from lambdas vector for Maxent object in dismo


lambdasDF <- function(mx) {
  lambdas <- mx@lambdas[1:(length(mx@lambdas)-4)]
  data.frame(var=sapply(lambdas, FUN=function(x) strsplit(x, ',')[[1]][1]),
             coef=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][2])),
             min=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][3])),
             max=sapply(lambdas, FUN=function(x) as.numeric(strsplit(x, ',')[[1]][4])),
             row.names=1:length(lambdas))
}

# number of non-zero parameters
length(netLQ$betas)
nrow(lambdasDF(mxLQ) %>% filter(coef != 0))

# response curves
plot(netLQ, vars=names(data), type='logistic')
response(mxLQ)

# function to make a raster prediction from a maxnet object
predict.mxnet <- function(mod, envs, type) {
  envs.pts <- rasterToPoints(envs)
  mxnet.p <- predict(mod, envs.pts, type=type)
  envs.pts <- cbind(envs.pts, as.numeric(mxnet.p))
  mxnet.p <- rasterFromXYZ(envs.pts[,c(1,2,22)], res=res(envs))
  return(mxnet.p)
}

# make logistic raster predictions
netLQ.p.log <- predict.mxnet(netLQ, envs, 'logistic')
mxLQ.p.log <- predict(mxLQ, envs)
# calculate Schoener's D overlap statistic to see how similar predictions are
nicheOv <- nicheOverlap(netLQ.p.log, mxLQ.p.log, stat = 'D')

# plot maxnet and dismo maxent side-by-side with D statistic
par(mfrow=c(1,2))
plot(netLQ.p.log, main='maxnet LQ logistic')
plot(mxLQ.p.log, main='dismo maxent LQ logistic')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

# plot predictions values against each other
plot(values(netLQ.p), values(mxLQ.p))

# make a cloglog prediction in maxnet and recalculate D with logistic 
# prediction for dismo maxent
netLQ.p.cll <- predict.mxnet(netLQ, envs, 'cloglog')
nicheOv <- nicheOverlap(netLQ.p.cll, mxLQ.p.log, stat = 'D')

# plot maxnet cloglog against dismo maxent logistic -- like the paper says,
# it emphasizes areas of moderate suitability
par(mfrow=c(1,2))
plot(netLQ.p.cll, main='maxnet LQ cloglog')
plot(mxLQ.p.log, main='dismo maxent LQ logistic')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

# now compare cloglog for maxnet and dismo maxent
mxLQ.p.cll <- predict(mxLQ, envs, args='outputformat=cloglog')
nicheOv <- nicheOverlap(netLQ.p.cll, mxLQ.p.cll, stat = 'D')

plot(netLQ.p.cll, main='maxnet LQ cloglog')
plot(mxLQ.p.cll, main='dismo maxent LQ cloglog')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

# now let's add hinges and do the same thing
# this demonstrates that models with hinges are also similar
netLQH <- maxnet(p=p, data=data, f=maxnet.formula(p=p, data=data, classes='lqh'))
mxLQH <- maxent(p=p, x=data, args=c('nothreshold', 'noproduct'))

length(netLQH$betas)
nrow(lambdasDF(mxLQH) %>% filter(coef != 0))

plot(netLQH, vars=names(data), type='logistic')
response(mxLQH)

netLQH.p.log <- predict.mxnet(netLQH, envs, 'logistic')
mxLQH.p.log <- predict(mxLQH, envs)
nicheOv <- nicheOverlap(netLQH.p.log, mxLQH.p.log, stat = 'D')

plot(netLQH.p.log, main='maxnet LQH logistic')
plot(mxLQH.p.log, main='dismo maxent LQH logistic')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

netLQH.p.cll <- predict.mxnet(netLQH, envs, 'cloglog')
nicheOv <- nicheOverlap(netLQH.p.cll, mxLQH.p.log, stat = 'D')

plot(netLQH.p.cll, main='maxnet LQH cloglog')
plot(mxLQH.p.log, main='dismo maxent LQH logistic')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

mxLQH.p.cll <- predict(mxLQH, envs, args='outputformat=cloglog')
nicheOv <- nicheOverlap(netLQH.p.cll, mxLQH.p.cll, stat = 'D')

plot(netLQH.p.cll, main='maxnet LQH cloglog')
plot(mxLQH.p.cll, main='dismo maxent LQH cloglog')
text(-50, 30, paste("D =", round(nicheOv, dig=5)))

Simon Tarr

unread,

Jan 21, 2019, 10:48:11 AM1/21/19

to Maxent

Hi everyone,

Sorry to drege up an old post but I was wondering how you've all got on with your investigations between dismo::MaxEnt() and maxnet, since these original exchanges?

Specifically, I was wondering if anyone has code that will generate nearly the same outputs, regardless to the algorithm that's being used, so that I can tweak settings and see how the results change. I ask because I've returned to an analysis from some months back which used ENMevaluate and dismo::maxent() to generate predictions (logistic output). Since re-running the analysis with ENMeval 0.3.0 and comparing the two algorithms (Java version of maxent v3.4.1 and maxnet 0.1.2), I've noticed quite substantial differences between the two outputs, even when plotting both as cloglog.

It makes me more than a little nervous that there are such large discrepancies between the two algorithms and leaves me wondering which one I should use.