Calculate AIC in R

1,857 views
Skip to first unread message

Hackett

unread,
Mar 18, 2013, 10:22:06 PM3/18/13
to max...@googlegroups.com
library(raster)

calAIC <- function(csvfile, grdfile, lambdasfile) {
    nparams = 0
    probsum = 0
    loglikelihood = 0
    AICcscore = 0
    AICscore = 0
    BICscore = 0

    lambdases <- read.csv(lambdasfile, header=FALSE)
    nparams <- nrow(lambdases[lambdases$V2 != 0, ])
    nparams = nparams - 4

    layerRaw <- raster(grdfile)
    probsum <- cellStats(layerRaw, sum)

    points <- read.csv(csvfile)
    npoints <- nrow(points)
    layerValues <- extract(layerRaw, points[, c("LONGITUDE", "LATITUDE")])
    loglikelihood <- sum(log(layerValues / probsum))

    if (nparams >= npoints - 1) {
        AICcscore <- "x"
        AICscore <- "x"
        BICscore <- "x"
    } else {
        AICcscore = (2 * nparams - 2 * loglikelihood) + (2 * (nparams) * (nparams + 1) / (npoints - nparams - 1))
        AICscore = 2 * nparams - 2 * loglikelihood
        BICscore = nparams * log(npoints) - 2 * loglikelihood
    }

    ICs <- c(csvfile, grdfile, loglikelihood, nparams, npoints, AICscore, AICcscore, BICscore)

    return(ICs)
}

getAICs <- function(modelfile) {
    models <- read.csv(modelfile, header=FALSE, as.is=TRUE, col.names=c("csvfile", "grdfile", "lambdasfile"))
    AICs <- mapply(calAIC, models$csvfile, models$grdfile, models$lambdasfile, USE.NAMES=FALSE)
    AICs <- t(AICs)
    colnames(AICs) <- c("Points", "ASCII file", "Log Likelihood", "Parameters", "Sample Size", "AIC score", "AICc score", "BIC score")
    outfile <- gsub(".csv", "_model_select.csv", modelfile)
    write.csv(AICs, outfile, row.names=FALSE)
}

The R code above were a R implementation of AIC, the algorithm used are as that in ENMTools, please have a try.

1. Make a model file as you use ENMTools
2. Run the code above
3. Run getAICs(NAME OF YOUR MODEL FILE)
After a while, the results will be shown in a file with suffix "_model_select.csv"

Silvia Monteiro

unread,
Jul 3, 2013, 11:17:40 AM7/3/13
to max...@googlegroups.com
Hi,
 
thank you very much for the R script! Could you please just tell me how to build the model file? Is it like the script you do to model selection for ENMtools? Something like this:
 
C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_1.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\gme1.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\gme1.lambdas
C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_1_5.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1_5\MODEL3\gme1_5.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1_5\MODEL3\gme1_5.lambdas
C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE1\MODEL3\Data8_2.csv,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE2\MODEL3\gme2.asc,C:\Users\Silvia\Documents\Biologia\Pos_graduacoes\PhD\dist_habitat\MAXENT\GME\DUPL_REMOV\TESTE\raw\TESTE2\MODEL3\gme2.lambdas
 
and is it R able to  recognize the files like this? (Do I have to put this file in the same folder of R?
 
Thanks in advance for your help!
 
Silvia

Roman Luštrik

unread,
Jun 1, 2014, 1:29:31 PM6/1/14
to max...@googlegroups.com
Sorry for digging up graves, but I have a question about your code. As I understand it right now, it works for one layer (grd) file. What would be the generalized version for multiple layer? So far I'm thinking along the lines of calculating the same step for each layer, with summing at the end.

   probsum <- cellStats(environment, stat = mean, na.rm = TRUE) # environment is RasterStack and probsum is a named vector of means per layer
   
   layer.values <- slot(model, "presence") # value of layer cells at presence locations
   sum(log(layer.values/probsum), na.rm = TRUE) # divide each column by the corresponding probsum, log transform and sum

There is a catch, though. What if some values in a raster are 0? When we take log, they become -Inf and hell breaks loose. Please advise.

Cheers,
Roman

romunov

unread,
Jun 2, 2014, 3:12:49 PM6/2/14
to max...@googlegroups.com
Please disregard my previous mail. Dan Warren pointed out that this 'grdfile' is actually "suitability scores" a.k.a. prediction raster. It now makes sense.

Cheers,
Roman


--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.



--
In God we trust, all others bring data.

John Baumgartner

unread,
Jun 2, 2014, 6:35:21 PM6/2/14
to max...@googlegroups.com
Here's some code that I wrote a while back for calculating the information criteria as they are calculated by ENMTools. I haven't looked at Hackett's code in detail, so not sure how different it is, but Roman's question re a "generalized version for multiple layers" is addressed in the last example below (*IC are calculated for each layer of a RasterStack [i.e., each of a set of raw maxent predictions]). No guarantees about the code - up to you to decide if/when it's appropriate to use it ;)

maxentIC <- function(pred.raw, occ, lambdas) {
  # pred.raw: A raster object or a file path to a raw maxent prediction grid.
  #           Providing a RasterStack will result in *IC being calculated for 
  #           each layer. 
  # occ: A matrix/data.frame with 2 columns (lon, lat). 
  # lambdas: A maxent object or a file path to a lambdas file.
  require(raster)
  if(is(lambdas, 'MaxEnt')) lambdas <- textConnection(lambdas@lambdas)
  if(is.character(pred.raw)) pred.raw <- raster(pred.raw)
  lambdas <- read.csv(lambdas, header=FALSE)
  k <- sum(lambdas[, 2] != 0) - 4
  out <- t(sapply(seq_len(nlayers(pred.raw)), function(i) {
    x <- pred.raw[[i]]
    x.std <- x/sum(values(x), na.rm=TRUE)
    occ.nodupes <- occ[!duplicated(cellFromXY(x, occ)), ]
    n <- nrow(occ.nodupes)
    ll <- log(prod(extract(x.std, occ.nodupes), na.rm=TRUE))
    AIC <- 2*k - 2*ll
    AICc <- AIC + ((2*k*(k+1))/(n - k - 1))
    BIC <- k*log(n) - 2*ll
    c(n=n, k=k, ll=ll, AIC=AIC, AICc=AICc, BIC=BIC)
  }))
  row.names(out) <- names(pred.raw)
  out
}

An example...

Fit a model:

fnames <- list.files(path=file.path(system.file(package='dismo'), 'ex'), 
                     patt='grd', full=TRUE )
predictors <- stack(fnames)
occ <- read.csv(file.path(system.file(package="dismo"), 'ex/bradypus.csv'))[, -1]
me <- maxent(predictors, occ, factors='biome', path=tempdir())
r <- predict(me, predictors, args=c("outputformat=raw"), 
             filename={f <- tempfile()})


Passing the raster object to pred.raw and the maxent object to lambdas:
maxentIC(r, occ, me)
##        n  k       ll      AIC     AICc      BIC
## layer 94 47 -710.884 1515.768 1613.855 1635.303

Passing the raster file path to pred.raw, and the lambdas path to lambdas:
maxentIC(f, occ, file.path(tempdir(), 'species.lambdas'))

Passing a RasterStack to pred.raw:
r2 <- predict(me, stack(predictors[[1:8]] * 1.05, predictors[[9]]),
              args=c("outputformat=raw"), filename={f2 <- tempfile()})
# the above prediction is nonsense
s <- stack(r, r2) 
maxentIC(s, occ, me)
##          n  k        ll      AIC     AICc      BIC
## layer.1 94 47 -710.8840 1515.768 1613.855 1635.303
## layer.2 94 47 -739.7127 1573.425 1671.512 1692.960

romunov

unread,
Jun 3, 2014, 7:47:26 AM6/3/14
to maxent
Thank you John for your contribution. For completeness, I'm adding my function. I've compared it to your calculation and it differs a bit. I suspect limiting calculations to one point per raster cell gives rise to this discrepancy.

> johnAIC(pred.raw = prd.full, occ = coordinates(volk), lambdas = mdl.maxent.full)
       n  k        ll      AIC    AICc      BIC
layer 52 78 -574.1112 1304.222 847.778 1456.419

> AICmaxent(model = mdl.maxent.full, p = volk, sc = prd.full)
      AIC      AICc       BIC        LL      npar   npoints 
1420.3058  655.6904 1564.5368 -640.1529   70.0000   58.0000 
Warning message:
In AICmaxent(model = mdl.maxent.full, p = volk, sc = prd.full) :
  Many parameters compared to number of presence points.

Cheers,
Roman




#' Calculate AIC for MaxEnt model.
#' 
#' Function will calculate AIC, AICc and BIC for MaxEnt model. It is using the procedure
#' as outlined in Warren and Siefert 2011 with ideas of implementation from MaxEnt google 
#' mailing list. Function works on presence data, one layer of prediction scores and a
#' MaxEnt model object where lambda files is extracted from. Implementation hints from
#' 
#' @param model MaxEnt model.
#' @param p SpatialPoints or SpatialPointsDataFrame. Presence data.
#' @param sc RasterStack. Prediction (suitability) scores.
#' @references Dan L. Warren and Stephanie N. Seifert 2011. Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications 21:335–342. http://dx.doi.org/10.1890/10-1171.1
#' @author Roman Luštrik
AICmaxent <- function(model, p, sc) {
   # Based on 
   
#  Get number of parameters from lambdas file as per Warren and Seifert 2011:
#  For interpretation of lambda values see pdf at https://groups.google.com/forum/#!topic/maxent/gdN9D5hPZOg
   model.lambdas <- slot(model, "lambdas")
   exclude.lambdas <- grepl("linearPredictorNormalizer|densityNormalizer|numBackgroundPoints|entropy", model.lambdas)
   model.lambdas <- model.lambdas[!exclude.lambdas]
   model.lambdas <- gsub("`|'", "", model.lambdas) # remove ticks
   model.lambdas <- do.call("rbind", strsplit(model.lambdas, ","))   
   model.lambdas <- data.frame(variable = model.lambdas[, 1], apply(model.lambdas[, 2:4], MARGIN = 2, as.numeric))
   npar <- nrow(model.lambdas[model.lambdas[, "X1"] != 0, ])
   
   # Continued from Warren and Seifert 2011:
   #   These metrics
   #   are assessed by standardizing raw scores for each ENM
   #   so that all scores within the geographic space sum to 1
   #   and then calculating the likelihood of the data given
   #   each ENM by taking the product of the suitability scores
   #   for each grid cell containing a presence.
   sc.sum <- cellStats(sc, stat = sum, na.rm = TRUE)
   stand.sc <- sc/sc.sum
   pres.sc <- extract(stand.sc, p)
   ll <- sum(log(pres.sc), na.rm = TRUE)
   
   npts <- nrow(p)
#   From Warren and Seifert 2011 again:
#   We exclude models with zero parameters (which
#   occur in some cases with small sample sizes and
#   extremely high values for b) and models with more
#   parameters than data points (which violate the assump-
#   tions of AICc).
   # I do not remove the calculations, but a warning is issued.
   if (npar >= npts - 1) warning(paste("Many parameters compared to number of presence points."))
   
   AICcscore <- (2 * npar - 2 * ll) + (2 * (npar) * (npar + 1) / (npts - npar - 1))
   AICscore <- 2 * npar - 2 * ll
   BICscore <- npar * log(npts) - 2 * ll
   
   out <- c(AIC = AICscore, AICc = AICcscore, BIC = BICscore, LL = ll, 
      npar = npar, npoints = npts)
   out
}

John Baumgartner

unread,
Jun 7, 2014, 8:17:58 PM6/7/14
to maxent
Roman alerted me to the fact that k was not being calculated correctly in models with zero-weight features.

The following updated code should now resolve this issue.

Apologies.

maxentIC <- function(pred.raw, occ, lambdas) {
  # pred.raw: A raster object or a file path to a raw maxent prediction grid.
  #           Providing a RasterStack will result in *IC being calculated for 
  #           each layer. 
  # occ: A matrix/data.frame with 2 columns (lon, lat). 
  # lambdas: A maxent object or a file path to a lambdas file.
  require(raster)
  if(is(lambdas, 'MaxEnt')) lambdas <- textConnection(lambdas@lambdas)
  if(is.character(pred.raw)) pred.raw <- raster(pred.raw)
  lambdas <- read.csv(lambdas, header=FALSE, stringsAsFactors=FALSE) # corrected
  k <- sum(as.numeric(lambdas[, 2]) != 0) - 4 # corrected
  out <- t(sapply(seq_len(nlayers(pred.raw)), function(i) {
    x <- pred.raw[[i]]
    x.std <- x/sum(values(x), na.rm=TRUE)
    occ.nodupes <- occ[!duplicated(cellFromXY(x, occ)), ]
    n <- nrow(occ.nodupes)
    ll <- log(prod(extract(x.std, occ.nodupes), na.rm=TRUE))
    AIC <- 2*k - 2*ll
    AICc <- AIC + ((2*k*(k+1))/(n - k - 1))
    BIC <- k*log(n) - 2*ll
    c(n=n, k=k, ll=ll, AIC=AIC, AICc=AICc, BIC=BIC)
  }))
  row.names(out) <- names(pred.raw)
  out
}
Reply all
Reply to author
Forward
0 new messages