Discrepancy between MaxEnt Dismo and MaxEnt GUI

621 views
Skip to first unread message

Simon Research

unread,
Feb 18, 2016, 6:50:29 AM2/18/16
to Maxent

Hello All,


Recently I have been familiarising myself with the MaxEnt GUI (v3.3.3k) and have been successful in producing a range of models for species of interest to me. For increased reproducibility, I have decided to use the Dismo package and run MaxEnt through R.

I have noticed something very strange when comparing the outputs of the two methods, however. Using exactly the same environmental layers and occurrence records I end up with two wildly different predicted distributions when using anything but the default values.

For my first test I opened the MaxEnt GUI and, leaving all settings at default, ran the model for my species. I then ran the exact same model using the following code within R and, when plotted, I got nearly identical distributions (as expected).


    xx<-maxent(x=expl,p=spp.coords$garmani)


However, if I change the beta/regularization multiplier to anything except default, the maps look totally different. For example, if I set regularization=0.5 and prevalence=0.80 in the GUI and run the model (keeping all other settings exactly as default) but do the same in R with the following code then it all goes wonky.


xx<-maxent(x=expl,p=spp.coords$garmani, args=c(
  'betamultiplier=0.5',
  'defaultprevalence=0.80'

))


Interestingly, the AUCs and variable contributions all look nearly identical for the two models. I would have assumed therefore that the maps wouldn't look so wildly different. I have uploaded the maps to the following location so you can see for yourself the discrepancy.


Maps discrepancy


When analysing the two HTML outputs from running the models (default and non-default) I have only noticed the one difference. When using the GUI with 81 presence records and 10,000 background points I am told that "10020 points used to determine the Maxent distribution (background points and presence points)". However, when using R I am told "10081 points used to determine the Maxent distribution (background points and presence points)". As I mentioned previously, all other settings were kept the same, so points haven't been used in cross-validating, for example.


I was wondering if anyone else had come across this issue before. Am I doing something fundamentally wrong within R for the two distributions to be so different? I was under the impression that within R, all other settings would remain as default in the GUI, unless you specified certain arguments as in the second code example above.

Ahmed El-Gabbas

unread,
Feb 18, 2016, 7:28:21 AM2/18/16
to Maxent
Hi Simon,

I assume that your 81 presence points are located in only 20 pixels at your resolution. Maxent, by default, uses only one instance of  presences locations located in the same pixel (check option: 'Remove duplicate presence locations'). However, I assume dismo passes the species data to maxent as SWD (sample with data) format without checking for possible duplicates.

Assuming that previous assumption is correct, maxent models using GUI and dismo use different number of presences to train the model (20 and 81, respectively). Allowing all other settings as default, this will affect the features to be used by maxent: GUI model (20 records) will have the linear, quadratic, and hinge features & dismo model have in addition the threshold and product features. Please check this statement in Elith et al. 2010 (10.1111/j.1472-4642.2010.00725.x):
"MaxEnt includes a range of feature types, and subsets of these can be used to simplify the solution. By default, the program restricts the model to simple features if few samples are available (linear is always used; quadratic with at least 10 samples; hinge with at least 15; threshold and product with at least 80) because – as for any modelling method – few samples provide limited information for determining the relationships between the species and its environment (Barry & Elith, 2006; Pearson et al., 2007)"

It seems using default prevalence and beta multiplier does not affect much in your case, however when changing their values it makes this difference.

I hope this helps,
Ahmed El-Gabbas

Simon Research

unread,
Feb 18, 2016, 7:34:02 AM2/18/16
to Maxent
Hi Ahmed,

Many thanks for the detailed response. What you says makes a lot of sense. I guess the solution then is to define an argument in Dismo to remove duplicate presence locations?
What I'm unsure about, however, is that I thought Dismo used the exact same defaults as the GUI. If this is the case, why would Dismo pass species data to MaxEnt as SWD when the GUI does not?

Many thanks

Simon

Simon Research

unread,
Feb 18, 2016, 7:53:14 AM2/18/16
to Maxent
I should add to this discussion some extra information.

Sometimes Dismo appears to 'ignore' arguments. For example, if I set background samples to something less than the default (let's say 5000), the HTML report appears to show that augment being ignored. The HTML file shows "maximumbackground=5000" but it still reports "10081 points used to determine the Maxent distribution (background points and presence points)". Which one is it? Has it used 5000 points but a bug in generating the HTML output says 10000 or has it ignored my request of 5000 points and instead used 10000 points?

Ahmed El-Gabbas

unread,
Feb 18, 2016, 8:23:20 AM2/18/16
to Maxent
Hi Simon,

After checking the help of the dismo:::maxent() function (http://www.inside-r.org/packages/cran/dismo/docs/maxent), you may need to enable the argument  'removeDuplicates' to make comparison valid. If this was not successful, you may need to trim your presence locations manually before running your models.
As for the number of backgrounds to be used, I am familiar with this problem. The function has an internal argument 'nbg=10000' which is not mentioned at all in the help page!!, making it is not easy to change the default number of backgrounds (10K) even when explicitly adding something like "maximumbackground=5000" to the arguments.

You may try something like:
xx <- maxent(x=expl,p=spp.coords$garmani, removeDuplicates=TRUE, nbg = 5000, args=c( 'betamultiplier=0.5',  'defaultprevalence=0.80'))

I am not sure how dismo:::maxent() function passes the commands to the jar file, It is beyond my experience. It was just an inference after looking at HTML output files created using GUI or dismo. In the HTML file, you will find a line of text at the end of the page starting with "Command line to repeat this species model: ....".
I've noticed that when running maxent using the GUI, 'samplesfile' is the location of the .csv file provided and 'environmentallayers' is the folder where the ascii files are located. However, when running it from dismo, both 'samplefile' and 'environmentallayers' are for SWD files, both are created in your output folder once you start training  your model (files: presence and absence, respectively). Also, when your run maxent using R, you need a raster stack which may not stored on disk at all (only in your memory, check raster:::InMemory()), making no ascii files available for maxent, which strengths my inference.

Hope this helps,
Ahmed

romunov

unread,
Feb 18, 2016, 8:27:59 AM2/18/16
to maxent
This is how dismo passes arguments along. The variable of interest here is `args`.


Cheers,
Roman

--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at https://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.



--
In God we trust, all others bring data.

Simon Research

unread,
Feb 18, 2016, 8:32:23 AM2/18/16
to Maxent
Hi Roman, thanks for the reply but I don't really know what I'm supposed to be looking at here? 

Simon Research

unread,
Feb 22, 2016, 7:52:48 AM2/22/16
to Maxent
Okay so having played with both Dismo and the GUI some more, I have convinced myself that it's the difference in the number of points used in determining the MaxEnt distribution that is causing the discrepancy between the two predicted distributions.
When using the same presence records, environmental layers and model settings, the HTML output displays the following:

MaxEnt GUI
81 presence records used for training.

10020 points used to determine the Maxent distribution (background points and presence points).

Dismo
81 presence records used for training.

10081 points used to determine the Maxent distribution (background points and presence points).


Is anyone able to suggest some reasons why the total number of points differs between the two models, bearing in mind all settings are the same? I can confirm that at the resolution my environmental layers are at, these presence points all fall within their own unique pixel/cell.

Many thanks.


On Thursday, February 18, 2016 at 11:50:29 AM UTC, Simon Research wrote:

Ahmed El-Gabbas

unread,
Feb 22, 2016, 8:09:29 AM2/22/16
to Maxent
try to turn off "Remove duplicate presence records" and turn on "add all samples to background". I think this way you force the model to have 10081 backgrounds (different sample) as used in dismo.

Hope this helps,
Ahmed El-Gabbas

Simon Research

unread,
Feb 22, 2016, 8:22:28 AM2/22/16
to Maxent
Thanks again for your help, Ahmed :) This has worked a treat- both distributions now look identical (even when settings are non-default).
Why there is a difference between Dismo and GUI has really frustrated me, I'm glad you have helped me get to the bottom of this!
Reply all
Reply to author
Forward
0 new messages