very low ESS in SNAPPER runs (help!)

lncespe...@gmail.com

unread,

Jun 26, 2024, 3:01:54 PM (5 days ago) Jun 26

to beast-users

Hi all,

I recently moved from doing SNAPP to snapper analyses and, although it does indeed runs a lot faster, I am still having issues getting the runs to achieve convergence. Specifically, the ESS values are very very low. For my last attempt, I ran two independent chains for 4'000.000 generations, sampling every 100. Most of ESS values don't go above 100 even when combining the independent runs (attached screenshot of Tracer). At this point, I don't think making the sampling interval smaller or making the chain longer (following https://beast2.org/increasing-esss/) would help given this took 5 days in my computing cluster. The analyses is based on 2,918 SNPs, for 109 individuals (10 species)

I am wondering if you have any advice on how to proceed now. I have some ideas in mind:

1) should I reduce the number of SNPs I am using or be very stringent with the amount of missing data per SNP?

2) Is it possible that the priors are the problem? I followed advice here (https://taming-the-beast.org/tutorials/BFD_snapper_tutorial/)

3) would adding a UPGMA strating tree help?

4) A particular concern that I have with my data set is that the number of samples per species is uneven. Specifically, I have three species (which are outgroups, but this is not specified in the .xml file) with only 2-3 individuals. Would excluding 2 of these species help? or is there an alternative way to deal with this?

Thank you very much for your insights! Really need some advice at the moment.

Thanks!

All the best,

Laura

Monica Ó Fathaigh

unread,

Jun 27, 2024, 2:43:06 PM (4 days ago) Jun 27

to beast-users

Hi Laura,

I had a huge improvement in ESS for coalescent rate estimates when I increased the operator weights on the gamma mover and rate mixer for this parameter. You can do this when you have your xml loaded into beauti. Select view, then select 'show operators panel' in the drop down. The gamam mover and rate mixer are the first two operators. I increased these to 10.

Try increasing the burn in length in Tracer to improve likelihood and posterior ESS - it is still too influenced by the prior. Also the chain length needs to be longer... at least 10,000,000? I have found that snapper can take weeks on a cluster. I chose to downsample from 70 to 40 samples as it was taking too long. For my particular dataset, which is probably characterised by incomplete lineage sorting, I found reducing SNPs meant there was not enough signal in the data and it couldn't identify lineages. Your species may be more differentiated and decreasing SNPs may not be an issue..

Another thing, I found that if I specified too many threads it slowed the job down... You might need to do a bit of trial and error to see what works on your cluster.

Good luck!

Monica

lncespe...@gmail.com

unread,

Jun 29, 2024, 12:53:07 PM (2 days ago) Jun 29

to beast-users

Hi Monica,

Thank you so much for your thorough answer! I am already trying to figure out what is the best number of threads, and then I will re-do the xml files following your suggestions. These makes a lot of sense. It is also reassuring to know that analyses can in fact take that long. Thank you very much!!