Two poteintial bugs on ExaML 3.0

Sohta Ishikawa

unread,

Jul 13, 2015, 6:03:00 AM7/13/15

to ra...@googlegroups.com

Hi all,

I'm running ExaML 3.0.14 on the Xeon Phi supercomputer in my university and have two problems for the ML tree inference with ExaML.

The first problem is that the resultant ML trees selected by ExaML runs, which were operated on MIC and CPU+MIC respectively, were different from each other.

Below is the commands for my analyses, examl-OMP-AVX and examl-MIC were build by using GCC 4.7.4 and ICC 15.0.2.

$parser -s mydata -q partition -m PROT -n mydata

The partition file was written as below,
---------------------------
LG, p1 = 1-41372
---------------------------

$sbatch run.sh

The run.sh files were written as below,

---------------------------------------------------------------
run.sh for running ExaML on two MIC cards

#!/bin/bash
#SBATCH -J ExaML_test
#SBATCH -p mic
#SBATCH -N 1
#SBATCH -t 02:00:00
#SBATCH -o stdout_nat
#SBATCH -e stderr_nat

#export I_MPI_DEBUG=5
export I_MPI_MIC=enable
export I_MPI_DEBUG_INFO_STRIP=disabled
export MIC_PPN=1
export MIC_OMP_NUM_THREADS=240
export MIC_KMP_AFFINITY=granularity=fine,balanced

cd $SLURM_SUBMIT_DIR
mpirun-mic2 -m "/PATH/to/examl-MIC -s /PATH/to/mydata.binary -n NATIVE -t /PATH/to/treefile -m GAMMA -w /PATH/to/current_dir"
---------------------------------------------------------------
---------------------------------------------------------------
run.sh for running ExaML in hybrid mode (20 CPU + 2 MIC cards)

#!/bin/bash
#SBATCH -J ExaML_test
#SBATCH -p mixed
#SBATCH -N 1
#SBATCH -t 02:00:00
#SBATCH -o stdout_sym
#SBATCH -e stderr_sym
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=10

#export I_MPI_DEBUG=5
export I_MPI_MIC=enable
export I_MPI_DEBUG_INFO_STRIP=disabled
export MIC_PPN=1
export OMP_NUM_THREADS=10
export MIC_OMP_NUM_THREADS=240
export MIC_KMP_AFFINITY=granularity=fine,balanced

cd $SLURM_SUBMIT_DIR
mpirun-mic2 -c "/PATH/to/examl-OMP-AVX -s /PATH/to/mydata.binary -n SYMMETRIC -t /PATH/to/treefile -m GAMMA -w /PATH/to/current_dir" -m "/PATH/to/examl-MIC -s /PATH/to/mydata.binary -n SYMMETRIC -t /PATH/to/treefile -m GAMMA -w /PATH/to/current_dir"
---------------------------------------------------------------

Of note, mpirun-mic2 is a command for running a parallelized program on CPU or MIC in our supercomputer; -m option is used for running the program compiled for MIC, -c option used for running the one build for CPU.

The topology of the resultant ML trees selected by ExaML_MIC and ExaML_CPU+MIC were significantly different nevertheless the two runs started from the same start tree and the same binary file. I suspect that this is not expected result and there might be some bugs on the likelihood calculation or tree searching algorithms on ExaML, especially when it is run in hybrid mode.

The second problem is that the resultant ML tree selected by ExaML_MIC run based on the LG4M model was different from that selected by the RAxML run with the LG4M model, nevertheless both runs started from the same tree. Similar result of the difference on the ML trees between ExaML and RAxML was found when I applied the LG4X model to the two ML analyses. I used the below partition file to apply LG4MX models to ExaML runs,

LG4M(LG4X), p1 = 1-41372

I consider RAxML and ExaML should select the same ML tree in above situation because, in my understanding, ExaML uses the same algorithm of the ML inference as that used in RAxML. So are there any bugs on the likelihood calculation with LG4MX models in ExaML, or there are any additional options I must use to get the same result from RAxML and ExaML runs?
Furthermore, ExaML_CPU+MIC selected different ML tree from those selected by ExaML_MIC and RAxML, with both LG4M and LG4X models. I think this is caused by the problem on the CPU+MIC computation as I mentioned above.

I would appreciate if anyone has any comments on these problems.
Thank you for your time and consideration.

Best,
Sohta

Alexey Kozlov

unread,

Jul 13, 2015, 7:02:02 AM7/13/15

to ra...@googlegroups.com

Dear Sohta,

this really sounds like a bug. Certainly, if identical binary input file and starting tree were used, the resulting
topology of ExaML-CPU, ExaML-MIC and ExaML-Hybrid should be the same (although there might be minor discrepancies in
logLH score because of numerical issues).

This problem might be connected with a recent bugfix in both RAxML and ExaML, which I haven't tested against ExaML-MIC
version yet.

Could you please send your input files to my personal email, so that I can check it?

Alexey

> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Sohta Ishikawa

unread,

Jul 13, 2015, 7:24:51 AM7/13/15

to ra...@googlegroups.com

Dear Alexey,

Thank you for your prompt reply!

Here are my test files.

Best,

Sohta

************************************************************************************************
Sohta Ishikawa Ph.D

Research Fellow in University of Tsukuba
Faculty of Life and Environmental Sciences
Center for Computational Sciences
Laboratory of Molecular Evolution of Microbes

1-1-1 Tennoudai, Tsukuba, Ibaraki, Japan 305-8577
mail: sai...@ccs.tsukuba.ac.jp

"La vie est drôle"
************************************************************************************************

--
You received this message because you are subscribed to a topic in the Google Groups "raxml" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/qqRpY9eoL8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+un...@googlegroups.com.

testdata.phy

test.tree

Alexandros Stamatakis

unread,

Jul 13, 2015, 10:36:53 AM7/13/15

to ra...@googlegroups.com

Hi Alexey and Sohta,

The differences might also be due to different round-off errors induced
by differences in compilers, #cores, order of reduction operations
conducted. There's a small dataset where RAxML results diverge depending
on weather I use one or two cores.

Alexis

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Karen

unread,

Sep 12, 2015, 12:15:47 AM9/12/15

to raxml

Hi all,

trying now since a couple of weeks to do tree searches with Examl 3.0.15 and 3.0.14 (including LG4X model) for large datasets + BS replicates

1) version 3.0.15 always fails (for me with some not understandable mpi errors (running examl SSE3) when calculating trees from BS replicates,
(doe not matter whether datasets are rather small (e.g. 300000 sites) or larger) datasets are between 50 and 120 taxa)

version 3.0.14 however works. ML tree searches on the original dataset works (mostly) but I decided to stick to 3.0.14 now for all (although don't know how
the change in the optimization procedure from 3.014 to 3.15 would change the results). Often Examl is not running anymore but mpi does not execute properly:

(error for 3.0.15:)
MXM: Got signal 15 (Terminated)
--------------------------------------------------------------------------
mpiexec detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[5680,1],110]
Exit code: 255

2) is it normal that tree searches on BS replicates in general

a) run much faster then on the original dataset and
b) the size of the BS binaries is different to the original (reduced) one (if you make a binary out of it) - all BS binaries are of same size same for one run different to the original (reduced) binary size - sometimes larger sometimes smaller

3) running partitioned datasets and BS: sometimes BS replicates on aa data (and nt) are generated that not fullfill RaxML/ExamL criteria (not having 20 aa states.
Is there another way to circumvent this (e.g. modify it or write a wrapper that it continues so long and only produce replicates that fullfill criteria? Or might this be biased?
I also notice when ignoring the error messages sometimes binaries are generated, sometimes (nt) the size is 0 b (for aa at least it generates something although they might fails during a run)

We tried to change partition schemes we received from other programs by manually merging and changing partitions again but this is very time consuming and results
unpredictable - (probably they are sometimes too small but sometimes they are larger but just too uniform)

cheers Karen

Reply all

Reply to author

Forward