new BEAST 2.4.0 can't find BEAGLE and doesn't use multiple threads

bander...@gmail.com

unread,

Mar 14, 2016, 11:18:48 AM3/14/16

to beast-users

Hi all,

I have been using BEAST v2.3.0 on a Mac (OSX 10.8.5) with BEAGLE installed and working with BEAST for the past few months. Today I downloaded the new BEAST version and it seems to be unable to find BEAGLE (with the error: Failed to load BEAGLE library: no hmsbeagle-jni in java.library.path). I tried reinstalling BEAGLE to no avail. I can execute the old executable from the command line and it finds BEAGLE no problem.

When I run BEAST without a -beagle option, it doesn't use more than a single core (12 core 24 thread machine). Specifying higher -threads option or -instances and -threads doesn't make a difference.

I tried executing the xml file I had made with the new version of BEAUti but using the old BEAST executable and it failed because it couldn't create class ThreadedTreeLikelihood, which I'm guessing is one of the updates in the new version that is supposed to improve threading.

Does anyone know how I can get the new version of BEAST to find BEAGLE? Or to use multiple cores?

Cheers,

Ben

bander...@gmail.com

unread,

Mar 14, 2016, 11:48:05 AM3/14/16

to beast-users

To clarify, I have also re-created my input file with the older version of BEAUti and run it through BEAST as I have in the past, and though it recognizes BEAGLE, it seems to be still restricted to a single core. Here is the BEAGLE resource:

0 : CPU

Flags: PRECISION_SINGLE PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL EIGEN_COMPLEX SCALING_MANUAL SCALING_AUTO SCALING_ALWAYS SCALERS_RAW SCALERS_LOG VECTOR_SSE VECTOR_NONE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU

I called BEAST as:

beast -beagle_sse -beagle_instances 7 -threads 7 allgenes_beast23.xml

I realise now that I have been only using BEAST for SNAPP analyses in the past few months, so perhaps SNAPP has a different implementation that allows threading? I tested it again and ran a SNAPP xml file with the same command as above and it started using 7 cores.

Any ideas what might be going on?

Remco Bouckaert

unread,

Mar 14, 2016, 2:18:58 PM3/14/16

to beast...@googlegroups.com

Hi Ben,

I found a small problem with the beast script, and re-released it with a fix, so re-installing should fix the BEAGLE loading problem. If it still does not find BEAGLE, you can specify the BEAGLE_LIB environment variable to the place where the BEAGLE library is installed in the terminal using

EXPORT BEAGLE_LIB=/path/to/beagle/lib

(probably /usr/local/lib) or set it in the BEAST\ v2.4.0/bin/beast script.

SNAPP does not use BEAGLE, so any BEAGLE related flags will be ignored when running a SNAPP analysis. SNAPP uses as many threads as you specify on the command line, but ignores the -beagle_instances flag, which is now the -instances flag in v2.4.0. Due to the way SNAPP threading works, more threads do not always show up as more CPU usage, so you have to experiment with the number of threads in order to get optimal performance for your dataset.

Hope this helps,

Remco

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

bander...@gmail.com

unread,

Mar 15, 2016, 5:59:45 AM3/15/16

to beast-users

Hi Remco,

Thanks for the rapid reply and the fix! I downloaded the fixed version and it finds BEAGLE no problem.

I am still having a problem with getting BEAST to use the available cores on my machine. It seems no matter what I specify when calling BEAST (the executable in /bin), it always says it is using BEAGLE and it never uses more than one core (100% CPU). Specifying "-threads -1" or "-threads 8" increases the number of Threads listed in Activity Monitor, but still for a single core (100% CPU). For instance, if I run BEAST without specifying -threads or -instances, it says it is using 100% CPU and 40 or so threads. When I increase this it goes to c. 130 threads and the max "-threads -1" is 328 threads, but all still on 100% CPU.

The analysis is very slow, and from my initial testing it looks like running ">beast input.xml" (resulting in 100% CPU and 40 Threads) is actually the fastest.

I would like to make use of more CPUs. Is it possible that specifying more threads for one CPU is slowing it down? Could it be that I need to increase the memory available (each analysis is using c. 500 MB)?

Do you know how to get it to run across CPUs?

Thanks for your help!

Ben

Andrew Rambaut

unread,

Mar 15, 2016, 6:20:13 AM3/15/16

to beast...@googlegroups.com

Dear Ben,

Can I suggest you try BEAST 1.8 as a comparison to see if it is something about your set up or BEAST 2?

Andrew

bander...@gmail.com

unread,

Mar 15, 2016, 7:48:00 AM3/15/16

to beast-users

Hi Andrew,

Thanks for the suggestion. I downloaded BEAST 1.8.3 and proceeded to use its version of BEAUti to generate an xml file similar to my previous run.

(As an aside, there were a few problems:

- it wasn't happy with how I specified things for the Calibrated Yule and wouldn't generate an xml

- it didn't allow me to specify Log Normal priors -- claiming failure to parse the number properly -- until I disabled the Calibrated Yule for just the Yule and could then set initial values

- it wouldn't start based on a random or UPGMA tree, because one of my taxon sets wasn't resolved in those trees… so I had to provide a guide tree

- it spit out an error when starting the run: Failed to load parser: dr.inferencexml.trace.GeneralizedHarmonicMeanAnalysisParser)

On the positive side, it had no problem spreading across cores.

I called beast like this:

> /path/to/beast -beagle_SSE -beagle_instances 7 -threads 7 input.xml

It started up using roughly 450% CPU and 48 Threads, then ramped up a bit later to c. 620% CPU and 48 Threads. It is running way faster than the BEAST2 run. So it seems that there is some difference in how BEAST2 is handling things.

(As another aside, I started three identical BEAST2 runs, simply re-saving the xml file as _r2.xml and _r3.xml, and there is a massive discrepancy in run speed across them. As of now, the third run has done about 98000 samples at 11h50m/M samples, the second run has done 18500 samples at 69h49m/M samples, and the first run has done 2500 samples. I executed them all as >beast input.xml in different tabs of the same terminal window. They are each using 100% CPU and 40 Threads, so I have no idea why there is such a massive discrepancy).

Thoughts?

Cheers,

Ben

Remco Bouckaert

unread,

Mar 15, 2016, 3:18:03 PM3/15/16

to beast...@googlegroups.com

Hi Ben,

If you are running a Standard analysis with a single alignment, you should use both -threads and -instances, with #threads equals to #instances.

If you are running a Standard or *BEAST analysis with multiple alignments, you need to set useThreads=“true” on the element with id=“likelihood” and use both -threads and -instances, with #threads greater or equal to #instances. The number of instances determines how many threads are used for a single alignment, and the number of threads determines how many threads are used in total.

Note that for *BEAST analyses it may not be worth splitting the alignments, and by default BEAUti produces TreeLikelihoods (so the -instances flag is ignored) instead of ThreadedTreeLikelihoods. If you want to use threads for such alignments, you have to edit the XML and replace spec=“TreeLikelihood” with spec=“ThreadedTreeLikelihood”.

If you have a mix with small and large alignments, you can reduce the number of threads used by smaller alignments by setting the threads attribute on elements with ThreadedTreeLikelihood for these alignments to get better load balancing, but you have to experiment a bit to find out what works best for your data.

Let me know if you can get a better CPU usage with any of these changes.

Cheers,

Remco

Benjamin Anderson

unread,

Mar 16, 2016, 2:38:12 AM3/16/16

to beast...@googlegroups.com

Hi Remco,

I've tried making the changes you suggested, but there is no improvement in CPU usage.

I have one alignment which I have broken into 12 partitions (I'm not sure if that is what you mean with multiple alignments or not). I have linked the clock and tree models across all the partitions, but the site models are different for each one.

In the xml file, I changed this line:
<distribution id="likelihood" spec="util.CompoundDistribution" useThreads="true">

For each of the partitions, BEAUti has already set spec="ThreadedTreeLikelihood".

For five of the smaller partitions, I added threads:
<distribution id="treeLikelihood.K80+I" spec="ThreadedTreeLikelihood" data="@K80+I" tree="@Tree.t:tree" threads="1">

I have tried running various combinations of "-beagle_SSE -instances X -threads Y" or "-instances X -threads Y" and they always only run on one CPU with different Threads numbers.

I also noticed that running without -instances or -threads or with -instances 1 results in 12 filtered sections initiated (corresponding to my 12 partitions), but using higher numbers of instances than 1 breaks those into many filtered sections of the alignment (e.g. some only a few 10s of bps long). Each one is assigned to the 0: CPU BEAGLE resource which states:

Using BEAGLE version: 2.1.2 resource 0: CPU
with instance flags: PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_NONE PROCESSOR_CPU FRAMEWORK_CPU

I notice there is a THREADING_NONE flag, but I don't know what that means, since this isn't really a problem with BEAST 1.8.3.

Would it help if I sent you my xml file in a private message?

Cheers,

Ben

--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/SWyFusALgMs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.

Remco Bouckaert

unread,

Mar 16, 2016, 3:28:28 AM3/16/16

to beast...@googlegroups.com

Hi Ben,

Thanks for the file. I noticed you use the calibrated Yule model and 5 calibrations. The Yule model is very computational intensive under these circumstances and dominates the computation. Also, it does not use threads, so threading the likelihoods (which normally dominate the calculation time otherwise) does not help for your analysis.

I suggest you replace the calibrated Yule with the plain Yule prior, and verify by sampling from the prior that the marginal distributions of the calibrations are what you expect them to be. Then optimising the likelihoods for performance as you outlined should show better CPU usage, but let me know if things turn out otherwise.

Cheers,

Remco

Benjamin Anderson

unread,

Mar 16, 2016, 4:27:29 AM3/16/16

to beast...@googlegroups.com

Thanks, Remco!

I switched to the plain Yule (and also set it to not estimate the clock rate) and it has started to use more cores.

I've been playing with settings for -instances and -threads, and I'll perhaps need to keep tweaking it to get the best performance.

As you say, I need to insert the useThreads="true", as well as to assign single threads to short partitions, to get the fastest running time. It seems like -instances has more impact, and perhaps it's ok to specify -threads -1 to let it automatically match the sum of partitions * instances (as I understand it).

Thanks again for your help!

Remco Bouckaert

unread,

Mar 16, 2016, 2:55:59 PM3/16/16

to beast...@googlegroups.com

Hi Ben,

Good to hear it works out. I’ll make sure the useThreads flag is set to true by default for the next release.