errors in examl 3.0.15 and 3.0.14

247 views
Skip to first unread message

Karen

unread,
Sep 13, 2015, 12:36:56 AM9/13/15
to raxml, Ondrej...@csiro.au
Hi all,

while now several weeks dealing with problems to run ExaML properly - see thread yesterday.
ML Tree searches on BS replicates with version 3.0.14 run through and ML tree searches on the original dataset are not - while ML tree searches run through (sometimes) in version 3.0.15 but BS crash immediately (MPI and to me not understandable error messages).

I tried already different compilers, different MPI threads, different clusters.
Sticking for the moment to 3.0.14, for the BS replicates.  For the ML treesearches on the original dataset I got errors like below. (It runs ca. 1.5 days before crashing)

Dataset: 118 taxa, ca. 207 partitions, ca. 1.4 mio aa sites including the LG4X model - I guess this causes the problem. (also for BS replicates in version 3.0.15)

Any suggestions?

(I could mix BS from version 3.0.14 and tree searches from version 3.0.15 but this to me seems not like it should be.  If I do so - are the binaries - though created with different parser versions compatible ?

I will try now again a different compiler (-clang)

Karen

####

Likelihood problem in model optimization l1: -35480431.
5236782580614089965820312500000000000000 l2: -35480419.4611485376954078674316406250000000000000 tolerance: 0.0000354804194611485401093624314494689997
Likelihood problem in model optimization l1: -35480431.5236782580614089965820312500000000000000 l2: -35480419.4611485376954078674316406250000000000000 tolerance: 0.0000354804194611485401093624314494689997
Likelihood problem in model optimization l1: -35480431.5236782580614089965820312500000000000000 l2: -35480419.4611485376954078674316406250000000000000 tolerance: 0.0000354804194611485401093624314494689997
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138806] *** Process received signal ***
[c238:138806] Signal: Aborted (6)
[c238:138806] Signal code:  (-6)
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138808] *** Process received signal ***
[c238:138808] Signal: Aborted (6)
[c238:138808] Signal code:  (-6)
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138814] *** Process received signal ***
[c238:138814] Signal: Aborted (6)
[c238:138814] Signal code:  (-6)
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138816] *** Process received signal ***
[c238:138816] Signal: Aborted (6)
[c238:138816] Signal code:  (-6)
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138818] *** Process received signal ***
[c238:138818] Signal: Aborted (6)
[c238:138818] Signal code:  (-6)
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138792] *** Process received signal ***
[c238:138792] Signal: Aborted (6)[c238:138792] *** Process received signal ***
[c238:138792] Signal: Aborted (6)
[c238:138792] Signal code:  (-6)
[c238:138792] [ 0] /lib64/libpthread.so.0(+0xf850)[0x2aaaaba05850]
[c238:138792] [ 1] /lib64/libc.so.6(gsignal+0x35)[0x2aaaabc45885]
[c238:138792] [ 2] /lib64/libc.so.6(abort+0x181)[0x2aaaabc46e61]
[c238:138792] [ 3] /lib64/libc.so.6(__assert_fail+0xf0)[0x2aaaabc3e740]
[c238:138792] [ 4] examl-SSE3[0x40eaca]
[c238:138792] [ 5] examl-SSE3[0x417731]
[c238:138792] [ 6] examl-SSE3[0x4043c5]
[c238:138792] [ 7] /lib64/libc.so.6(__libc_start_main+0xe6)[0x2aaaabc31c36]
[c238:138792] [ 8] examl-SSE3[0x402c49]
[c238:138792] *** End of error message ***
examl-SSE3: optimizeModel.c:3020: checkTolerance: Assertion `0' failed.
[c238:138793] *** Process received signal ***

Karen

unread,
Sep 18, 2015, 8:05:15 AM9/18/15
to raxml, Ondrej...@csiro.au

Hi all,

any help would be appreciated!

Best
Karen

Alexandros Stamatakis

unread,
Sep 18, 2015, 8:33:04 AM9/18/15
to ra...@googlegroups.com, Ondrej...@csiro.au
any sort of patience and awareness that we have got other things to do
as well would be even more appreciated,

alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Alexey Kozlov

unread,
Sep 18, 2015, 9:25:34 AM9/18/15
to ra...@googlegroups.com
Hi Karen,

unfortunately I cannot help you with the main issue - seems to be yet another problem in LG4X parameter optimization
routine...

But here are the answers to some of you side questions:

> (I could mix BS from version 3.0.14 and tree searches from version 3.0.15 but this to me seems not like it should
> be. If I do so - are the binaries - though created with different parser versions compatible ?

There is a safety check in the code which will prevent ExaML from loading binaries generated by the parser from
different versions. However, I'm pretty sure that the binary format itself didn't change between 3.0.14 and 3.0.15, so
technically it should be possible to load older binaries. So if you really want/have to do this, please let me know and
I'll tell you where you can disable the safety check in the code.


>2) is it normal that tree searches on BS replicates in general
>a) run much faster then on the original dataset and

This is something I've observed as well, although as far as I remember the difference in running time was not that
dramatric (i.e. they ran faster, but not much). One explaination I have for this is that BS replicates will have less
unique site patterns compared to the original alignment (cause some columns will be sampled twice). You can check the
difference in the number of patterns by looking at the ExaML output.

>b) the size of the BS binaries is different to the original (reduced) one (if you make a binary out of it) - all BS
>binaries are of same size same for one run different to the original (reduced) binary size - sometimes larger
>sometimes smaller

BS binary size could be different due to pattern compression (s. above), although I cannot see why it could become
larger that the original...

>3) running partitioned datasets and BS: sometimes BS replicates on aa data (and nt) are generated that not fullfill
>RaxML/ExamL criteria (not having 20 aa states.
>Is there another way to circumvent this (e.g. modify it or write a wrapper that it continues so long and only produce
>replicates that fullfill criteria? Or might this be biased?

@Alexis: I also have a feeling this requirement might be too strict, especially for AA data (e.g. if we have 19 out of
20 states in alignment). Would it be possible to use some kind of smoothing, e.g. assign some low but non-zero
frequencies to the missing states:

https://en.wikipedia.org/wiki/Additive_smoothing

?

Hope this helps at least a bit...

Alexey

On 18.09.2015 14:05, Karen wrote:
>

Alexandros Stamatakis

unread,
Sep 29, 2015, 11:59:31 AM9/29/15
to ra...@googlegroups.com
Dear Karen,

> unfortunately I cannot help you with the main issue - seems to be yet
> another problem in LG4X parameter optimization routine...

Can you confirm that it only happens under LG4X please?

Also if this is the case, do you have a small dataset that can be used
to reconstruct the error?

> >b) the size of the BS binaries is different to the original (reduced)
> one (if you make a binary out of it) - all BS >binaries are of same
> size same for one run different to the original (reduced) binary size -
> sometimes larger >sometimes smaller
>
> BS binary size could be different due to pattern compression (s. above),
> although I cannot see why it could become larger that the original...

They can be smaller but they can't be larger, can you please check and
confirm?

> >3) running partitioned datasets and BS: sometimes BS replicates on aa
> data (and nt) are generated that not fullfill >RaxML/ExamL criteria (not
> having 20 aa states.
> >Is there another way to circumvent this (e.g. modify it or write a
> wrapper that it continues so long and only produce >replicates that
> fullfill criteria? Or might this be biased?

There is no proper solution to this (I had a long discussion about this
with D. Posada today). By only selecting BS replicates that fulfill the
requirement you are biasing the BS sampling procedure by selecting BS
alignments that contain sites with specific residues more frequently
than they should. So the only option is to make the problematic
partitions larger, i.e., merge them.

We cannot apply a 20 state model to a partition with 19 or even 18
states without a lot of thinking and code re-design.

Alexis

Karen

unread,
Oct 1, 2015, 7:23:55 AM10/1/15
to raxml
Dear Alexis,

 answers between the lines

Dear Karen,

> unfortunately I cannot help you with the main issue - seems to be yet
> another problem in LG4X parameter optimization routine...

Can you confirm that it only happens under LG4X please?

I am not sure - at least that it was reported by a colleague that avoiding LG4X works - but I did not test it.
I could repeat the run (BS with 3.0.15) on the dataset by replacing all LG4X with LG?
 
Also if this is the case, do you have a small dataset that can be used
to reconstruct the error?


unfortunately not a small one (so might be also depending on the dataset)

>  >b) the size of the BS binaries is different to the original (reduced)
> one (if you make a binary out of it)  - all BS >binaries are of same
> size same for one run different to the original (reduced) binary size -
> sometimes larger >sometimes smaller
>
> BS binary size could be different due to pattern compression (s. above),
> although I cannot see why it could become larger that the original...

They can be smaller but they can't be larger, can you please check and
confirm?

I will check this again and keep you posted

>  >3) running partitioned datasets and BS: sometimes BS replicates on aa
> data (and nt) are generated that not fullfill >RaxML/ExamL criteria (not
> having 20 aa states.
>  >Is there another way to circumvent this (e.g. modify it or write a
> wrapper that it continues  so long and only produce >replicates that
> fullfill criteria? Or might this be biased?

There is no proper solution to this (I had a long discussion about this
with D. Posada today). By only selecting BS replicates that fulfill the
requirement you are biasing the BS sampling procedure by selecting BS
alignments that contain sites with specific residues more frequently
than they should. So the only option is to make the problematic
partitions larger, i.e., merge them.


ok I see. We are at the moment haveily working with several people on Partitionfinder (implement a check for this and also improving partitioning schemes) to find a way around here.
 
We cannot apply a 20 state model to a partition with 19 or even 18
states without a lot of thinking and code re-design.


ok I see, see above.
If you have suggestions to issue 1) (what should I test again) would be great.
Many thanks so far

Karen

Alexandros Stamatakis

unread,
Oct 2, 2015, 4:08:01 AM10/2/15
to ra...@googlegroups.com
Hi Karen,

> Can you confirm that it only happens under LG4X please?
>
> I am not sure - at least that it was reported by a colleague that
> avoiding LG4X works - but I did not test it.
> I could repeat the run (BS with 3.0.15) on the dataset by replacing all
> LG4X with LG?

Yes, that would be very helpful.

> Also if this is the case, do you have a small dataset that can be used
> to reconstruct the error?
>
> unfortunately not a small one (so might be also depending on the dataset)

That's always the problem with debugging ExaML ...

> They can be smaller but they can't be larger, can you please check and
> confirm?
>
>
> I will check this again and keep you posted

Thanks.

> > >3) running partitioned datasets and BS: sometimes BS replicates
> on aa
> > data (and nt) are generated that not fullfill >RaxML/ExamL
> criteria (not
> > having 20 aa states.
> > >Is there another way to circumvent this (e.g. modify it or write a
> > wrapper that it continues so long and only produce >replicates that
> > fullfill criteria? Or might this be biased?
>
> There is no proper solution to this (I had a long discussion about this
> with D. Posada today). By only selecting BS replicates that fulfill the
> requirement you are biasing the BS sampling procedure by selecting BS
> alignments that contain sites with specific residues more frequently
> than they should. So the only option is to make the problematic
> partitions larger, i.e., merge them.
>
>
> ok I see. We are at the moment haveily working with several people on
> Partitionfinder (implement a check for this and also improving
> partitioning schemes) to find a way around here.

Okay, I was thinking about this again this morning, on the other hand,
if this phenomenon only affects 2-3 out of 1000 partitions or so, there
will of course be a theoretical bias to the bootstrap if you just ignore
the BS reps with less than 20 states, on the other hand it's probably so
small that one will not note it at all.


> We cannot apply a 20 state model to a partition with 19 or even 18
> states without a lot of thinking and code re-design.
>
>
> ok I see, see above.
> If you have suggestions to issue 1) (what should I test again) would be
> great.

What you could do is the following:

1. open file optimizeModel.c

2. search for line:

//#define _DEBUG_MOD_OPT

and replace by:

#define _DEBUG_MOD_OPT


3. re-compile

4. re-run

5. send me the terminal output

Karen

unread,
Oct 2, 2015, 5:25:07 AM10/2/15
to raxml
Hi Alexis,

I will try all this. Unfortunately it can take a large while - sorry for that (several inconvenient reasons: traveling on conferences and no rights to recompile ExaML by myself on clusters, so have to stick to the IT-admins - my own mashine is too small).
I will as you advised ask that they make the changes and recompile version 3.0.15 and then try * Treesearch, * Bootstraps both with LG4X, without LG4X. (To "create" a  smaller dataset that the errors remain reproducable would take me probably much more work then to run the large ones.)

I guess there is no opportunity sending you several datasets that you/your group test it in the debug mode?
Anyway, I will try to be as fast as I can.

Many thanks, Karen

Alexandros Stamatakis

unread,
Oct 2, 2015, 5:31:26 AM10/2/15
to ra...@googlegroups.com
Hi Karen,

I see, please send us the data, input files and command lines anyway, I
might get to it, but I can't promise that this will happen soon.

Alexis
> www.exelixis-lab.org <http://www.exelixis-lab.org>
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Karen

unread,
Oct 2, 2015, 5:35:50 AM10/2/15
to raxml
Perfect, that's awesome.
Since too large for any attachements here I will put everything together and send it to you and your group in a separate link.
I will also prepare then accordingly partition files / binary files with and without LG4X

Many many thanks
Karen

Alexandros Stamatakis

unread,
Oct 2, 2015, 5:39:10 AM10/2/15
to ra...@googlegroups.com
perfect, thanks,

alexis
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.

Karen

unread,
Nov 19, 2015, 7:25:17 AM11/19/15
to raxml
Hi all,

sorry fior the massive delay.
We try(ied) in the mean time to work on a workaround to avoid that partitions 8and hopeufully the BS replicates) always have 20 states present. Once achieved this, I will come back to this trying Tree search and BS with ExaML 3.0.15
(where Tree search worked but treesearch of BS replicates crashed with LG4X and then come back to this again)

Best Karen

Alexandros Stamatakis

unread,
Nov 19, 2015, 11:28:20 AM11/19/15
to ra...@googlegroups.com
Hi Karen,

> sorry fior the massive delay.

No problem.


> We try(ied) in the mean time to work on a workaround to avoid that
> partitions 8and hopeufully the BS replicates) always have 20 states
> present. Once achieved this, I will come back to this trying Tree search
> and BS with ExaML 3.0.15

Okay.

> (where Tree search worked but treesearch of BS replicates crashed with
> LG4X and then come back to this again)

This might also be associated to the state problem, the search may work,
but when bootstrapping not all 20 states might be present in a BS replicate.

Alexis
Reply all
Reply to author
Forward
0 new messages