calculating area of suitable habitat in R

1,490 views
Skip to first unread message

Rebecca Stubbs

unread,
Nov 29, 2017, 11:05:21 AM11/29/17
to Maxent
I have run Maxent for multiple species under present conditions and also under future climate change scenarios. I was quantifying changes between present and future suitable habitat using the nicheOverlap function and Schoener's D statistic. Quite a few of the organisms in my study are just moving farther up mountains so there is a lot of overlap as the future distribution is inside the present distribution (just at higher elevations). But by looking at the ascii files in QGIS I can see that there is less suitable habitat in the future, so I wanted to quantify this. I have scoured the internet for a good way to calculate area for rasters and never found anything that perfectly suited my fancy. I therefore wrote up something that is an amalgamation of bits and pieces of various scripts. It is pasted below. 

Two questions:
1) do you all agree this is doing what I think it is doing (calculating area in square kilometers)?
2) is there a way to simplify this? Specifically you'll see I go from a raster to a dataframe back to raster? Maybe I could stay in rasters?

Thanks for any input!

Rebecca


####
library(raster)

#load rasters
m <- raster("SpeciesA_avg.asc")
mf <- raster("SpeciesA_future_layers_avg.asc")

#change to dataframe
m.df <- as.data.frame(m, xy=TRUE)

#get rid of NAs
m.df1 <- na.omit(m.df)

#keep only cells that that have a suitability score above 0.5 (scores range from 0 to 1)
m.df2 <- m.df1[m.df1$SpeciesA_avg> 0.5,]

#re-rasterize just the suitable area
m.raster <- rasterFromXYZ(m.df2)

##same as above but for future projection
mf.df <- as.data.frame(mf, xy=TRUE)
mf.df1 <- na.omit(mf.df)
mf.df2 <- mf.df1[mf.df1$SpeciesA_future_layers_avg>0.5,]
mf.raster <-rasterFromXYZ(mf.df2)

#get sizes of all cells in current distribution raster
#note my original layers were 30 seconds or 1 km2. 
#I thought about just multiplying by the res (0.008333333) I could also get area but as I am working far from the equator I know every square is not going to be exactly 1 km2. So, I'm using the area function instead
cell_size<-area(m.raster, na.rm=TRUE, weights=FALSE)

#delete NAs from all raster cells. It looks like these come back when switching from dataframe to raster
cell_size1<-cell_size[!is.na(cell_size)]

#compute area [km2] of all cells in raster
raster_area_present<-length(cell_size1)*median(cell_size1)
raster_area_present

#get sizes of all cells in future raster [km2]
cell_size<-area(mf.raster, na.rm=TRUE, weights=FALSE)

#delete NAs from vector of all raster cells
cell_size1<-cell_size[!is.na(cell_size)]

#compute area [km2] of all cells in geo_raster
raster_area_future<-length(cell_size1)*median(cell_size1)
raster_area_future

##calculate change in area
dif_area <- raster_area_present - raster_area_future
dif_area

Jamie M. Kass

unread,
Dec 20, 2017, 9:52:52 AM12/20/17
to Maxent
I think I can help you make this a little more efficient. First off though, I worry that na.omit() will remove entire rows from your data frame if a single value is NA. Have you checked this? I've modified your code below to avoid converting to data frames, and cleaned up a little. Please test this to make sure it produces results that make sense to you -- I wrote this pretty quickly. And please see my comments about thresholding.

####
library(raster)

# your species occurrence coordinates (x, y)
occs <- read.csv("Species_Occurrences.csv")


#load rasters
m <- raster("SpeciesA_avg.asc")
mf <- raster("SpeciesA_future_layers_avg.asc")

# for thresholding, I would use a value based on the data, not something
# arbitrary like 0.5, which is suspect because the "suitability" values
# Maxent outputs don't always peak at 1.0, and 1.0 here does not mean "presence".
# thus, I'll go here with the minimum training presence value, but you can
# use a different one if you like -- Maxent suggests some in the results table.
vals <- extract(m, occs)
mtp <- min(vals)
# now, just threshold your rasters with a conditional
# the output will be 0's and 1's
m.thresh <- m >= mtp
mf.thresh <- mf >= mtp
# now, we can set all the cells with 0 to NA
m.thresh[m.thresh == 0] <- NA
mf.thresh[mf.thresh == 0] <- NA

# get sizes of all cells in current distribution raster
m.cell_size <- area(m.thresh, na.rm=TRUE, weights=FALSE)
mf.cell_size <- area(mf.thresh, na.rm=TRUE, weights=FALSE)

# I think this is the correct method for getting the total area
m.area_present <- cellStats(m.cell_size, sum)
mf.area_present <- cellStats(mf.cell_size, sum)

##calculate change in area
dif_area <- m.area_present - mf.area_present

Jamie Kass
PhD Candidate
City College of NY

Rebecca Stubbs

unread,
Dec 20, 2017, 3:09:27 PM12/20/17
to Maxent
Hi Jamie! Thanks for this response! I also posted this question on Stack Overflow and I got a different answer though the results are only slightly different. Well, first the guy on Stack Overflow insulted my scripting abilities/use of Stack Overflow (though somewhat fairly) but then presented an elegant solution (lol). I linked it here if you are interested. The script you supplied worked smoothly as well

Though the methods are slightly different I think the important difference is in the selected threshold values. Thank you for making that point. I have spent the morning reading about it. Luckily there are lots of links to follow about this. Here's a good one for anyone interested in choosing thresholds. There's also the Liu et al. papers (2005 and 2013).

It seems like the choice is between Minimum Training Presence (MTP) and Maximizing the sum of sensitivity and specificity (Max SSS) for selecting the value the determines if an area is suitable. I'm running 15 replicates for each of my species so the Maxent maxentResults.csv has 15 values for both "Minimum training presence logistic threshold" and "Maximum training sensitivity plus specificity logistic threshold". So I'm going to take the average for both categories and try both values as the cutoff threshold for determining the total area of suitable habitat to see how that changes my results. I will also examine how well using these cutoffs fits the occurrence points I have. I guess I could also re-run Maxent and apply the threshold rule of MTP and also MaxSSS and see which better reflects the known distributions. 

I realize the above paragraph is a bit off the original topic but I really appreciate you bringing my attention to this. Any further thoughts on this are appreciated!

Best,
Rebecca

Jamie M. Kass

unread,
Dec 22, 2017, 11:08:23 AM12/22/17
to max...@googlegroups.com
Rebecca,

Looks like Rob Hijmans himself answered your question. What an honor! Unfortunately he did not comment on the choice of threshold. I think I'll add a small comment myself.

Taking the average of the threshold across replicates and applying this average to a single model prediction is not statistically valid, in my opinion. This would be valid if you applied the average threshold to a prediction generated from model average across all the replicates (an ensemble of these models).

The following is true if you are not doing random k-fold validation. Doing replicates at all operates under the assumption that each random background point sample you make provides you with a different environmental profile. This should not be the case if you chose enough background points to appropriately sample your background extent (i.e. more than the default of 10,000 if this is a poor sample of the total cells). If you choose enough background points, you wouldn't need to do any replicates, in theory, because they'd all give you the same results (within some small margin of error).

If you are doing random k-fold validation, you should probably consider a spatial cross-validation technique that limits the spatial autocorrelation between your partitions. See Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera‐Arroita, G., ... & Warton, D. I. (2017). Cross‐validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography.

As for thresholds, you can also consider the "10% threshold" if you have some occurrence points that may be environmental outliers due to high spatial uncertainty, etc. This threshold tosses out the lowest 10% and then finds the minimum suitability. I have found that this fixes a lot of the problems with overprediction that MTP has.

Jamie

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+unsubscribe@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.



--
-----------------------------------------------------------
Jamie Kass
PhD Student, Department of Biology
City College of New York, CUNY Graduate Center

Rebecca Stubbs

unread,
Feb 20, 2018, 10:44:31 AM2/20/18
to max...@googlegroups.com
Hi Jamie - thanks for all this advice. I get what you are saying about replicates but some of my datasets are very small. Therefore, when I take a subset (15%) to test the model this random subset could change my AUC scores. Averaging replicates for this reason seems necessary. Thoughts?

I did some test changing background points from 10,000 to 50,000 to 100,000 and for my 17 datasets ranging in both number of points and size of area and I didn't see a lot of change in AUC score or Schoener's D. Thanks for pointing this out though - there is some interesting literature out about this. 

The main reason I'm writing is to circle back to my original question. I have run both your and Dr. Hijmans' scripts on my data using the 10% threshold. They produce the same results +/-5. You'll see Dr. Hijman's states that the answer is in meters squared. I've done the math on that and it is impossible. For instance, when the fundamental niche of a species occupies more than half of California I have a general idea of how much area that should be and km2 makes a lot more sense. Also, the resolution I'm using is at30 arc seconds, which at the equator is roughly 1 km2 (and granted I'm working far away from the equator), but shouldn't the unit be km2? Sorry if this is obvious to other people, but I'm confused by this point.

Thanks for any insight!

Rebecca






Jamie M. Kass

unread,
Feb 21, 2018, 11:13:40 PM2/21/18
to max...@googlegroups.com
Rebecca,

If your datasets are very small, consider jackknife partitioning (train model on n-1 points where n is total number of points, test on left-out point, repeat). Else some spatial blocking technique. You can implement either easily in ENMeval.

Even if the AUC or D don’t change, you should examine your response curves. If you sample more of the background, you’re likely to get more of the available environmental signal. Check out a paper we published recently about this:


As for the area question, it is indeed sq km for rasters. Check the function documentation:


Jamie

To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie Kass
PhD Student, Department of Biology
City College of New York, CUNY Graduate Center

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie M. Kass
PhD Candidate, Dept. of Biology

Rebecca Stubbs

unread,
Feb 22, 2018, 10:21:27 AM2/22/18
to max...@googlegroups.com
Awesome! Thanks for the info and especially the article! I'll be citing that :) 

Rebecca

To unsubscribe from this group and all its topics, send an email to maxent+unsubscribe@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie Kass
PhD Student, Department of Biology
City College of New York, CUNY Graduate Center

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+unsubscribe@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+unsubscribe@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie M. Kass
PhD Candidate, Dept. of Biology

City College of New York, CUNY Graduate Center

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+unsubscribe@googlegroups.com.

Amber

unread,
Aug 7, 2018, 10:36:38 PM8/7/18
to Maxent
Hi Rebecca and Jamie,

This is the only thread I've found that seems to address something I'm currently struggling with so I'm wondering if either of you could offer some advice please? 

For my MSc project, I'm attempting to quantify the area of suitable habitat in my SDMs. So I want to take the two different projections I have created with ENMeval (one SDM in the present climate and one in future climate) and estimate the area of suitable habitat at both time periods so I can then estimate habitat loss or gain. 

Am I right in thinking the 'm' and 'mf' rasters in your code are these two different projections, present and future? And by taking the area of suitable habitat at 'mf' time period from 'm' time period you get the amount of suitable habitat area lost/ gained between these two time periods?

I've tried your code but so far I keep getting this error...

#my enmeval projections
current <- predict(enmeval@models[[7]], modlayers)
future <- predict(enmeval@models[[7]], flayers)

#(is there a step here with converting the projections that I am missing?)

m <- raster(present)
mf <- raster(future)

occs <- read.csv("species_occurrences.csv")
occs <- occs[,13:14]
vals <- extract(m, occs)

#Error in .readCells(x, cells, 1) : no data on disk or in memory 


Also, the code you linked from Rob Hijmans https://stackoverflow.com/questions/47563823/calculating-area-of-most-suitable-raster-habitat-in-r is working for me but I don't really understand what it's doing. The different areas output from it, are they the total area of suitable habitat on the whole map? Or just the area of suitable habitat over a certain threshold? if so where in that code did he define the threshold value?

Clearly, I'm struggling with this and as a Masters student time is not on my side, so any help would be very much appreciated! 

Thank you!
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie Kass
PhD Student, Department of Biology
City College of New York, CUNY Graduate Center

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.
--
-----------------------------------------------------------
Jamie M. Kass
PhD Candidate, Dept. of Biology

City College of New York, CUNY Graduate Center

--
You received this message because you are subscribed to a topic in the Google Groups "Maxent" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/maxent/wj4UNTHwCf0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to maxent+un...@googlegroups.com.

Jamie M. Kass

unread,
Aug 8, 2018, 12:41:07 AM8/8/18
to Maxent
Let me try to diagnose. First off, your “current” raster is called “current”, not “present”. Second, no need to convert the product of predict() to raster — it shoud already be a raster. Lastly, the error looks like you can’t read the input raster. Do you have another object in the R environment named “present” that is not a raster? Fix the names and confirm that this isn’t the issue.

The StackOverflow post by Hijmans thresholds by 0.5 before calculating cell areas. He multiplies one by 10 and adds them to make a categorical code. This way, when you run zonal stats, you can see the total area for each of the codes and match them to the corresponding state — i.e., 10 means just the raster multiplied by 10 has a prediction, but 11 means that both do, etc.

Amber

unread,
Aug 8, 2018, 1:29:33 AM8/8/18
to Maxent
Thanks for your quick response Jamie! That was just a typo I made while writing this up here, oops! The names are definitely not the issue.
Message has been deleted

Luke Sutton

unread,
Aug 30, 2018, 6:33:09 AM8/30/18
to Maxent
Hi Jamie

Just picked up on this thread from searching for similar solutions myself to the same problem. Thanks for supplying the code which all runs well, but what would you change the code too when using a 10% threshold rule? 

So from this: mtp <- min(vals) to something like this: 10tp <- ??(vals)

Been struggling for the past hour to work this out, so hoping there's a quick fix. 

Thanks in advance. 

Luke

Jamie M. Kass

unread,
Aug 30, 2018, 6:48:22 AM8/30/18
to Maxent
Luke,

You'd need to do this:

# find the number of values (rounded up) that represent 10% of the total
vals.10p.num <- ceiling(0.1 * length(vals))
# sort the values, then get the value that is the one next in rank to the highest value of the lowest 10%
vals.10p <- sort(vals)[vals.10p.num + 1]

Hope this makes it clear.

Jamie Kass
PhD Candidate
City College of NY

Luke Sutton

unread,
Sep 3, 2018, 12:44:31 PM9/3/18
to Maxent
Excellent, many thanks Jamie. Code worked well. I had just worked it out in a less smooth way: 

tentp <- sort(vals)[1:(length(vals)/10)] # lowest values in increasing order.
tentp

But your code gives the single threshold figure, rather than all the lowest 10% values. 

Cheers

Luke

Abid Ali

unread,
Mar 20, 2024, 4:01:44 AMMar 20
to Maxent
Hi All, 

This is indeed an interesting topic!

Well, I work with rasters and projected models for current and historical and/or paleo data from a phylogeographical perspective.  However,  the output for LGM gives a higher suitable area of no more than 0.5, for example, lower 0.2 and higher 0.54. Similarly, LIG in some cases, giving a highly suitable area is no more than 0.4.

I am now stuck with calculating the area of suitability. I have to reclassify the rasters under the same values i.e., 0 (non-suitable) and 1 (highly suitable) however, in that case, Should I understand it simply means entirely no suitable area for the species during LGM or LIG because only 04 or 0.5 higher suitability? 

Do thresholding criteria solve the issue, I can see a similar colour in the LGM and LIG that represents the suitable area in the current raster however, the values look far below than present rasters!

I would like to have suggestions on this!

Thank you so much!

Abid 

Reply all
Reply to author
Forward
0 new messages