beast 2 - limit on number of taxa?

Victor Soria-Carrasco

unread,

Nov 18, 2015, 3:08:50 PM11/18/15

to beast-users

Hello,

I am trying to run a test run with >5,000 taxa and I got the error below. I reduced progressively the number of taxa and the analysis actually starts if the number is <4,248 taxa. Is there a hard-coded limit on the number of taxa in BEAST 2 (2.3.1)?

Best,

Victor

-------------------
Exception in thread "main" java.lang.StackOverflowError
    at java.util.TreeMap$PrivateEntryIterator.hasNext(TreeMap.java:1199)
    at java.util.AbstractMap.putAll(AbstractMap.java:280)
    at java.util.TreeMap.putAll(TreeMap.java:327)
    at java.util.TreeMap.<init>(TreeMap.java:185)
    at beast.evolution.tree.Node.copy(Unknown Source)
    at beast.evolution.tree.Node.copy(Unknown Source)
    at beast.evolution.tree.Node.copy(Unknown Source)
    [...]
    at beast.evolution.tree.Node.copy(Unknown Source)
    at beast.evolution.tree.Node.copy(Unknown Source)
    at beast.evolution.tree.Node.copy(Unknown Source)

Alexei Drummond

unread,

Nov 18, 2015, 3:17:48 PM11/18/15

to beast...@googlegroups.com

Hey Victor,

There is no hard-coded limit. But it appears that you have uncovered a part of the code that is not memory efficient causing BEAST2 to get into trouble when the number of taxa are a few thousand. This bug should be relatively easy to fix, but the fact is that BEAST2 is not going to do a good job of sampling the posterior distribution of such a large tree. Do you really need to know where every one of those taxa sits in the tree? You may want to think about using a faster, more approximate method if the answer is yes. Researchers in my group have successfully used BEAST2 to produce nice repeatable and credible posterior distributions for 1000 taxa, but I am not aware off the top of my head of success on data sets much larger than that.

Cheers

Alexei

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Victor Soria-Carrasco

unread,

Nov 19, 2015, 2:18:05 PM11/19/15

to beast-users

Hi Alexei,

Thanks for the quick reply. At the moment, I am doing some testing with datasets of 3-4k taxa and ~10kb using Tesla Kepler K40M GPUs with 12GB memory. The analyses start and 1m generations take 3-4h, so it may be feasible to run at least 100m generations. I don't know if mixing will be good, though. If things look bad, I may consider the possibility of using topological backbones or even fixed topologies. Anyway, my point is that with the development of more and more powerful GPUs, it will make sense to give BEAST a shot with very large datasets, and therefore this limit on the number of taxa will become a more serious issue.

Cheers,

Victor

Graham

unread,

Nov 21, 2015, 9:14:02 AM11/21/15

to beast-users

The error message says Java is running out of stack. The maximum stack size can be changed (search for 'java stack size'). The default is platform dependent, but can be as small as 320k. If the tree is a caterpillar, there will (I think) be 5k stack frames which could easily be more than that.

Victor Soria-Carrasco

unread,

Nov 21, 2015, 11:01:46 AM11/21/15

to beast-users

Hi Graham,

It actually works when using -Xss2m. Thank you very much!

Cheers,

Victor

Gen

unread,

Oct 18, 2016, 7:42:29 AM10/18/16

to beast-users

Hi Victor,

My dataset is not quite as large as yours (1800 taxa, 2071 patterns), after 100 million chains the ESS values were still terrible and there was no sign of the run converging (run on a GPU with Beast 2.4). This was to be expected and I am now trying to use a reliable topological constraint, as well as dated monophyletic clades (based on fossil records, implemented in beauti) but I am encountering multiple errors.

Do you know of a tutorial that covers the use of topological constraints? I have obviously scoured google groups and followed suggestions on a few phylogenetic blog posts, without luck.