Post-processing MaxEnt Statistic - True Skill Statistic

3,161 views
Skip to first unread message

Anon

unread,
Nov 25, 2015, 12:17:11 PM11/25/15
to Maxent
Hi all,

I am interested in calculating the True Skill Statistic (TSS) mentioned in Allouche (2006), but I am trying to determine the most meaningful way to do this so that I can compare a number of diffferent models for the same species that were developed using different background sampling extents.  

My understanding is that Cohen's kappa (Which I will not be using due to its observed dependence on prevalence), or the similar TSS, are generally calculated using the a test data set of species records and background selected pseudo-absences used to train/test the model.  Due to the questions we are asking in our current study, models were built with quite different background extents and I get the sense (correctly or not, this is part of my question), that an extra large background extent may artificially inflate the TSS score when the background greatly exceeds the area that the species in question could plausibly disperse.  I recognize that use of large background extents (beyond an area where a species could potentially disperse to/occur) is generally considered a SDM faux paus, but for our question, we consider it possibly necessary to break (bend?) this rule.  

If this is true (and please correct me if I am wrong), I would like to calculate TSS for all the models using the exact same pseudo-absences selected from a much more confined and plausible (in terms of dispersal) extent.

It makes sense to me that this would be fine and believe using this more confined background extent for all models in calculating TSS would provide a more standardized TSS score that I could then use to classify my models and/or thresholds as better or worse. 

Perhaps I don't fully understand the relationship of pseudo-absences in building these models, but it seems like this would be appropriate. To clarify, I would be calculating TSS and therefore model performance for all models based on the exact same following data: 

1. A random subset of species records, 25%, that was kept out of model training OR testing in MaxEnt.  
             i. Sensitivity
2. A confined "plausible" background extent where 10,000 pseudo-absences were selected from.
             i. Specificity

Calculations of TSS will occur post-MaxEnt

Is there anyone out there with a more refined statistical knowledge of MaxEnt than mine own, that could verify or reflect on my thoughts?  

Thanks much for your help,
Anon

Ref: 

Allouche 2006 Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)

Jacob Koundouonon MOUTOUAMA

unread,
Nov 25, 2015, 10:42:34 PM11/25/15
to max...@googlegroups.com

Dear Anon,

Kindly find the attached file it contains TSS excel sheet.

Acoordind to Alaa Eldeen in 2012, Just to give some info.  to use  TSS file,  first before your running the maxent model  you need to tell maxent to write backgroundPredictions file by selecting this option from the setting, by that, maxent will create backgroundPredictions file in the output directory. All what you need to do is just to copy the last column (logistic) from the backgroundPredictions file and paste it in the first column in TSS file, then go to the output director and open file named samplePredictions and select only cells from the last column named (Logistic prediction) that are adopted to be test not train(you can do this by checking the 3rd column which named Test or train) copy it and paste it in the 3rd column (test) in the TSS file. Now close all files you have except TSS file, select the 1th column (background) and doing ascending sorting for the current selection only, repeat this step with the 3rd column (test). Then you need to identify you threshold value based on the adopted threshold criterion that you want to use. all what you need to do is just count how many cells have values lower than the value of your threshold, and how many one higher than it. in the TSS file you need is to write these numbers in the highlighted cells with blue and red colors. where,

F6 Number of cells that have value above the threshold value in test column

 F7 Number of cells that have value below the threshold value in test column

G6 Number of cells that have value above the threshold value in bachground column

G7 Number of cells that have value below the threshold value in bachground column

 H8 Is the sum of F6,F7,G6,G7

I3 Tottal No. of the background points selected from the maxent setting

 TSS value will appear in the yellow cell (I15)



best regards


--
You received this message because you are subscribed to the Google Groups "Maxent" group.
To unsubscribe from this group and stop receiving emails from it, send an email to maxent+un...@googlegroups.com.
To post to this group, send email to max...@googlegroups.com.
Visit this group at http://groups.google.com/group/maxent.
For more options, visit https://groups.google.com/d/optout.

Copy of tss.xls

Anon

unread,
Nov 30, 2015, 9:29:02 AM11/30/15
to Maxent
Hello Jacob,

Thank you for the response.  I have read the forum thread you are referring to, but unfortunately I don't believe it fully pertains to my situation.  I will be using the true skill statistic, but have need to run the TSS independently from the data used in MaxEnt to test the model.  therefore, the use of the sample predictions (the logistic prediction) won't help me.  

As such, I withheld 25% of my total data points (randomly selected) and used the remaining 75% to train and test the models (per the usual approach used in MaxEnt using 5 fold cross-validation). I withheld the 25% so that I can test these against the averaged model of the 5-fold cross-validation (since in MaxEnt, each fold of the cross-validation uses a different set of trained data and therefore no independent testing data is left).  
 
I see no problem using this withheld 25% to test against all of my averaged models (from 5-fold cross validation), however I would this communities opinion as to whether I can use an independent set of background prediction points (described in my first post) to test against all models that I've produced for each species in my study (regardless of the background extent from which each model was originally tested against.).

Any thoughts?

Thanks,
Anon

Gabriel Jácome

unread,
May 1, 2017, 7:48:22 PM5/1/17
to Maxent
Dear Jacob:

Can you explain how to adopt the threshold criterion?  I'm not sure which value should I use.

Thank you 

Gabriel 

Mariana Delgado

unread,
Jul 6, 2018, 5:44:11 AM7/6/18
to Maxent
Dear Jacob Thanks so muchhhhh! 

you made me very happy, it helped me a lot

banya pouduma

unread,
Jun 13, 2019, 9:29:59 AM6/13/19
to Maxent
I am in a similar situation. I have used all my presence data for training the maxent indicating 30% for testing, now  I want to calculate the TSS as a second measure of validation. how do i go about it now that i dont know the points that maxent reserve for the testing, I have read Jacob's advice but it seem not working for me or i did not get the explanation well. can any one out there please clarify things for me.  i am using few presence (data 105) Calculating that against 10000 psydo absent background point is giving me some wired value.
thanks

banya pouduma

unread,
Jun 13, 2019, 9:29:59 AM6/13/19
to Maxent
pls could clarify which becomes true positives, false negatives and true negatives and false negatives 
thank you
Reply all
Reply to author
Forward
0 new messages