Calculate AIC in R

Hackett

unread,

Mar 18, 2013, 10:22:06 PM3/18/13

to max...@googlegroups.com

library(raster)

calAIC <- function(csvfile, grdfile, lambdasfile) {
nparams = 0
probsum = 0
loglikelihood = 0
AICcscore = 0
AICscore = 0
BICscore = 0

lambdases <- read.csv(lambdasfile, header=FALSE)
nparams <- nrow(lambdases[lambdases$V2 != 0, ])
nparams = nparams - 4

layerRaw <- raster(grdfile)
probsum <- cellStats(layerRaw, sum)

points <- read.csv(csvfile)
npoints <- nrow(points)
layerValues <- extract(layerRaw, points[, c("LONGITUDE", "LATITUDE")])
loglikelihood <- sum(log(layerValues / probsum))

if (nparams >= npoints - 1) {
AICcscore <- "x"
AICscore <- "x"
BICscore <- "x"
} else {
AICcscore = (2 * nparams - 2 * loglikelihood) + (2 * (nparams) * (nparams + 1) / (npoints - nparams - 1))
AICscore = 2 * nparams - 2 * loglikelihood
BICscore = nparams * log(npoints) - 2 * loglikelihood
}

ICs <- c(csvfile, grdfile, loglikelihood, nparams, npoints, AICscore, AICcscore, BICscore)

return(ICs)
}

getAICs <- function(modelfile) {
models <- read.csv(modelfile, header=FALSE, as.is=TRUE, col.names=c("csvfile", "grdfile", "lambdasfile"))
AICs <- mapply(calAIC, models$csvfile, models$grdfile, models$lambdasfile, USE.NAMES=FALSE)
AICs <- t(AICs)
colnames(AICs) <- c("Points", "ASCII file", "Log Likelihood", "Parameters", "Sample Size", "AIC score", "AICc score", "BIC score")
outfile <- gsub(".csv", "_model_select.csv", modelfile)
write.csv(AICs, outfile, row.names=FALSE)
}

The R code above were a R implementation of AIC, the algorithm used are as that in ENMTools, please have a try.

1. Make a model file as you use ENMTools
2. Run the code above
3. Run getAICs(NAME OF YOUR MODEL FILE)
After a while, the results will be shown in a file with suffix "_model_select.csv"

Silvia Monteiro

unread,

Jul 3, 2013, 11:17:40 AM7/3/13

to max...@googlegroups.com

Hi,

thank you very much for the R script! Could you please just tell me how to build the model file? Is it like the script you do to model selection for ENMtools? Something like this:

C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_1.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\gme1.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\gme1.lambdas
C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_1_5.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1_5\MODEL3\gme1_5.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1_5\MODEL3\gme1_5.lambdas
C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_2.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE2\MODEL3\gme2.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE2\MODEL3\gme2.lambdas

and is it R able to recognize the files like this? (Do I have to put this file in the same folder of R?

Thanks in advance for your help!

Silvia

Roman Luštrik

unread,

Jun 1, 2014, 1:29:31 PM6/1/14

to max...@googlegroups.com

Sorry for digging up graves, but I have a question about your code. As I understand it right now, it works for one layer (grd) file. What would be the generalized version for multiple layer? So far I'm thinking along the lines of calculating the same step for each layer, with summing at the end.

probsum <- cellStats(environment, stat = mean, na.rm = TRUE) # environment is RasterStack and probsum is a named vector of means per layer

layer.values <- slot(model, "presence") # value of layer cells at presence locations

sum(log(layer.values/probsum), na.rm = TRUE) # divide each column by the corresponding probsum, log transform and sum

There is a catch, though. What if some values in a raster are 0? When we take log, they become -Inf and hell breaks loose. Please advise.

Cheers,

Roman

romunov

unread,

Jun 2, 2014, 3:12:49 PM6/2/14

to max...@googlegroups.com

Please disregard my previous mail. Dan Warren pointed out that this 'grdfile' is actually "suitability scores" a.k.a. prediction raster. It now makes sense.

Cheers,

Roman

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

--
In God we trust, all others bring data.

John Baumgartner

unread,

Jun 2, 2014, 6:35:21 PM6/2/14

to max...@googlegroups.com

Here's some code that I wrote a while back for calculating the information criteria as they are calculated by ENMTools. I haven't looked at Hackett's code in detail, so not sure how different it is, but Roman's question re a "generalized version for multiple layers" is addressed in the last example below (*IC are calculated for each layer of a RasterStack [i.e., each of a set of raw maxent predictions]). No guarantees about the code - up to you to decide if/when it's appropriate to use it ;)

maxentIC <- function(pred.raw, occ, lambdas) {

# pred.raw: A raster object or a file path to a raw maxent prediction grid.

# Providing a RasterStack will result in *IC being calculated for

# each layer.

# occ: A matrix/data.frame with 2 columns (lon, lat).

# lambdas: A maxent object or a file path to a lambdas file.

require(raster)

if(is(lambdas, 'MaxEnt')) lambdas <- textConnection(lambdas@lambdas)

if(is.character(pred.raw)) pred.raw <- raster(pred.raw)

lambdas <- read.csv(lambdas, header=FALSE)

k <- sum(lambdas[, 2] != 0) - 4

out <- t(sapply(seq_len(nlayers(pred.raw)), function(i) {

x <- pred.raw[[i]]

x.std <- x/sum(values(x), na.rm=TRUE)

occ.nodupes <- occ[!duplicated(cellFromXY(x, occ)), ]

n <- nrow(occ.nodupes)

ll <- log(prod(extract(x.std, occ.nodupes), na.rm=TRUE))

AIC <- 2*k - 2*ll

AICc <- AIC + ((2*k*(k+1))/(n - k - 1))

BIC <- k*log(n) - 2*ll

c(n=n, k=k, ll=ll, AIC=AIC, AICc=AICc, BIC=BIC)

}))

row.names(out) <- names(pred.raw)

out

}

An example...

Fit a model:

fnames <- list.files(path=file.path(system.file(package='dismo'), 'ex'),

patt='grd', full=TRUE )

predictors <- stack(fnames)

occ <- read.csv(file.path(system.file(package="dismo"), 'ex/bradypus.csv'))[, -1]

me <- maxent(predictors, occ, factors='biome', path=tempdir())

r <- predict(me, predictors, args=c("outputformat=raw"),

filename={f <- tempfile()})

Passing the raster object to pred.raw and the maxent object to lambdas:

maxentIC(r, occ, me)

##        n  k       ll      AIC     AICc      BIC
## layer 94 47 -710.884 1515.768 1613.855 1635.303

Passing the raster file path to pred.raw, and the lambdas path to lambdas:

maxentIC(f, occ, file.path(tempdir(), 'species.lambdas'))

Passing a RasterStack to pred.raw:

r2 <- predict(me, stack(predictors[[1:8]] * 1.05, predictors[[9]]),

              args=c("outputformat=raw"), filename={f2 <- tempfile()})

# the above prediction is nonsense

s <- stack(r, r2) 

maxentIC(s, occ, me)

##          n  k        ll      AIC     AICc      BIC
## layer.1 94 47 -710.8840 1515.768 1613.855 1635.303
## layer.2 94 47 -739.7127 1573.425 1671.512 1692.960

romunov

unread,

Jun 3, 2014, 7:47:26 AM6/3/14

to maxent

Thank you John for your contribution. For completeness, I'm adding my function. I've compared it to your calculation and it differs a bit. I suspect limiting calculations to one point per raster cell gives rise to this discrepancy.

> johnAIC(pred.raw = prd.full, occ = coordinates(volk), lambdas = mdl.maxent.full)

n k ll AIC AICc BIC

layer 52 78 -574.1112 1304.222 847.778 1456.419

> AICmaxent(model = mdl.maxent.full, p = volk, sc = prd.full)

AIC AICc BIC LL npar npoints

1420.3058 655.6904 1564.5368 -640.1529 70.0000 58.0000

Warning message:

In AICmaxent(model = mdl.maxent.full, p = volk, sc = prd.full) :

Many parameters compared to number of presence points.

Cheers,

Roman

#' Calculate AIC for MaxEnt model.

#'

#' Function will calculate AIC, AICc and BIC for MaxEnt model. It is using the procedure

#' as outlined in Warren and Siefert 2011 with ideas of implementation from MaxEnt google

#' mailing list. Function works on presence data, one layer of prediction scores and a

#' MaxEnt model object where lambda files is extracted from. Implementation hints from

#' https://groups.google.com/forum/#!topic/maxent/EsXZKlpvdTI and reference cited at the end.

#'

#' @param model MaxEnt model.

#' @param p SpatialPoints or SpatialPointsDataFrame. Presence data.

#' @param sc RasterStack. Prediction (suitability) scores.

#' @references Dan L. Warren and Stephanie N. Seifert 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335–342. http://dx.doi.org/10.1890/10-1171.1

#' @author Roman Luštrik

AICmaxent <- function(model, p, sc) {

# Based on

# Get number of parameters from lambdas file as per Warren and Seifert 2011:

# For interpretation of lambda values see pdf at https://groups.google.com/forum/#!topic/maxent/gdN9D5hPZOg

model.lambdas <- slot(model, "lambdas")

exclude.lambdas <- grepl("linearPredictorNormalizer|densityNormalizer|numBackgroundPoints|entropy", model.lambdas)

model.lambdas <- model.lambdas[!exclude.lambdas]

model.lambdas <- gsub("`|'", "", model.lambdas) # remove ticks

model.lambdas <- do.call("rbind", strsplit(model.lambdas, ","))

model.lambdas <- data.frame(variable = model.lambdas[, 1], apply(model.lambdas[, 2:4], MARGIN = 2, as.numeric))

npar <- nrow(model.lambdas[model.lambdas[, "X1"] != 0, ])

# Continued from Warren and Seifert 2011:

# These metrics

# are assessed by standardizing raw scores for each ENM

# so that all scores within the geographic space sum to 1

# and then calculating the likelihood of the data given

# each ENM by taking the product of the suitability scores

# for each grid cell containing a presence.

sc.sum <- cellStats(sc, stat = sum, na.rm = TRUE)

stand.sc <- sc/sc.sum

pres.sc <- extract(stand.sc, p)

ll <- sum(log(pres.sc), na.rm = TRUE)

npts <- nrow(p)

# From Warren and Seifert 2011 again:

# We exclude models with zero parameters (which

# occur in some cases with small sample sizes and

# extremely high values for b) and models with more

# parameters than data points (which violate the assump-

# tions of AICc).

# I do not remove the calculations, but a warning is issued.

if (npar >= npts - 1) warning(paste("Many parameters compared to number of presence points."))

AICcscore <- (2 * npar - 2 * ll) + (2 * (npar) * (npar + 1) / (npts - npar - 1))

AICscore <- 2 * npar - 2 * ll

BICscore <- npar * log(npts) - 2 * ll

out <- c(AIC = AICscore, AICc = AICcscore, BIC = BICscore, LL = ll,

npar = npar, npoints = npts)

out

}

John Baumgartner

unread,

Jun 7, 2014, 8:17:58 PM6/7/14

to maxent

Roman alerted me to the fact that k was not being calculated correctly in models with zero-weight features.

The following updated code should now resolve this issue.

Apologies.

maxentIC <- function(pred.raw, occ, lambdas) {

# pred.raw: A raster object or a file path to a raw maxent prediction grid.

# Providing a RasterStack will result in *IC being calculated for

# each layer.

# occ: A matrix/data.frame with 2 columns (lon, lat).

# lambdas: A maxent object or a file path to a lambdas file.

require(raster)

if(is(lambdas, 'MaxEnt')) lambdas <- textConnection(lambdas@lambdas)

if(is.character(pred.raw)) pred.raw <- raster(pred.raw)

lambdas <- read.csv(lambdas, header=FALSE, stringsAsFactors=FALSE) # corrected

k <- sum(as.numeric(lambdas[, 2]) != 0) - 4 # corrected

out <- t(sapply(seq_len(nlayers(pred.raw)), function(i) {

x <- pred.raw[[i]]

x.std <- x/sum(values(x), na.rm=TRUE)

occ.nodupes <- occ[!duplicated(cellFromXY(x, occ)), ]

n <- nrow(occ.nodupes)

ll <- log(prod(extract(x.std, occ.nodupes), na.rm=TRUE))

AIC <- 2*k - 2*ll

AICc <- AIC + ((2*k*(k+1))/(n - k - 1))

BIC <- k*log(n) - 2*ll

c(n=n, k=k, ll=ll, AIC=AIC, AICc=AICc, BIC=BIC)

}))

row.names(out) <- names(pred.raw)

out

}

Reply all

Reply to author

Forward