raxmlHPC-SSE3 output truncated without error

36 views
Skip to first unread message

Thomas Halliday

unread,
Nov 29, 2016, 6:17:19 AM11/29/16
to raxml
Dear all,

I have been running raxmlHPC-SSE3 on the UCL computer cluster, Legion, for a rather large analysis, and the program output has not progressed since one hour after the run began. I have checked with the cluster's helpdesk, who say that I am not going over the memory requirements; there is plenty of space left on the nodes. However, nothing has been writted to the RAxML_info file since that time.

The analysis is a partitioned morphological/genomic dataset, and this is as far as the output gets:

"This is RAxML version 8.2.9 released by Alexandros Stamatakis on July 20 2016.

With greatly appreciated code contributions by:
Andre Aberer      (HITS)
Simon Berger      (HITS)
Alexey Kozlov     (HITS)
Kassian Kobert    (HITS)
David Dao         (KIT and HITS)
Sarah Lutteropp   (KIT and HITS)
Nick Pattengale   (Sandia)
Wayne Pfeiffer    (SDSC)
Akifumi S. Tanabe (NRIFS)
Charlie Taylor    (UF)


Alignment has 8959689 distinct alignment patterns

Proportion of gaps and completely undetermined characters in this alignment: 90.30%

RAxML rapid hill-climbing mode

Using 2 distinct models/data partitions with individual per partition branch length optimization


Executing 1 inferences on the original alignment using 1 user-specified trees

All free model parameters will be estimated by RAxML
GAMMA model of rate heterogeneity, ML estimate of alpha-parameter

GAMMA Model parameters will be estimated up to an accuracy of 0.1000000000 Log Likelihood units

Partition: 0
Alignment Patterns: 748
Name: p1
DataType: BINARY/MORPHOLOGICAL
Substitution Matrix: Uncorrected
Correcting likelihood for ascertainment bias



Partition: 1
Alignment Patterns: 8958941
Name: p2
DataType: DNA
Substitution Matrix: GTR




RAxML was called as follows:

raxmlHPC-SSE3 -f d -g constraint.txt -j -M -m GTRGAMMA -q partition.txt -n firstresult -s combineddata.phylip -p 310889 --asc-corr=lewis


Partition: 0 with name: p1
"


Could you give me some reasons why RAxML might just stop producing output at this point? I am investigating the analysis on an interactive shell in the cluster, but some wise words from more experienced folk would be helpful. I had already tested a tiny dataset in the same format to see if everything was working, and that was fine. If it weren't for two facts, I would assume this was an issue of it being a larger dataset.

Fact 1 - The analysis is not taking up too much memory, and could take up more if needed.
Fact 2 - A submission where I incorrectly formatted the constraint file got further with the same data before returning an error, including the state frequencies for both partitions.
Fact 3 - Identifying the state frequencies for a 234 taxon, 748 character partition should not take long, even if the rest of the dataset is massive.

I have searched the google group fairly thoroughly and have not found another answer, so if this has been discussed before, and I have just been using the wrong keywords to search for it, please do point me in that direction.

Many thanks,

Thomas

PS - I am already trying the PTHREADS version to see if that makes a difference, but I don't necessarily see why it should.

Alexandros Stamatakis

unread,
Nov 29, 2016, 6:30:54 AM11/29/16
to ra...@googlegroups.com
Dear Thomas,

This looks like a huge dataset since it has 8959689 distinct sites, you
should use ExaML for analyzing this which is much more scalable for this
kind of datasets. I am not suprised that no output has been printed yet.

Alexis

On 29.11.2016 12:17, Thomas Halliday wrote:
> Dear all,
>
> I have been running raxmlHPC-SSE3 on the UCL computer cluster, Legion,
> for a rather large analysis, and the program output has not progressed
> since one hour after the run began. I have checked with the cluster's
> helpdesk, who say that I am not going over the memory requirements;
> there is plenty of space left on the nodes. However, nothing has been
> writted to the RAxML_info file since that time.
>
> The analysis is a partitioned morphological/genomic dataset, and this is
> as far as the output gets:
>
> /"This is RAxML version 8.2.9 released by Alexandros Stamatakis on July
> /Could you give me some reasons why RAxML might just stop producing
> output at this point? I am investigating the analysis on an interactive
> shell in the cluster, but some wise words from more experienced folk
> would be helpful. I had already tested a tiny dataset in the same format
> to see if everything was working, and that was fine. If it weren't for
> two facts, I would assume this was an issue of it being a larger dataset.
>
> Fact 1 - The analysis is not taking up too much memory, and could take
> up more if needed.
> Fact 2 - A submission where I incorrectly formatted the constraint file
> got further with the same data before returning an error, including the
> state frequencies for both partitions.
> Fact 3 - Identifying the state frequencies for a 234 taxon, 748
> character partition should not take long, even if the rest of the
> dataset is massive.
>
> I have searched the google group fairly thoroughly and have not found
> another answer, so if this has been discussed before, and I have just
> been using the wrong keywords to search for it, please do point me in
> that direction.
>
> Many thanks,
>
> Thomas
>
> PS - I am already trying the PTHREADS version to see if that makes a
> difference, but I don't necessarily see why it should.
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Thomas Halliday

unread,
Nov 29, 2016, 8:31:38 AM11/29/16
to raxml
Dear Alexis,

It is good to know this is a result of the size of the dataset - I will try ExaML. However, I think I am right in saying that ascertainment bias correction is currently not supported in ExaML. Is that right? It only applies to the first partition, which is relatively small, but would certainly be very useful to incorporate if that is possible, and with the reduced datasets on which I tested the parameters, it did improve certain aspects of the topology.

Thomas

Thomas Halliday

unread,
Nov 29, 2016, 8:54:16 AM11/29/16
to raxml
Further to that message, I see that ExaML only deals with DNA and AA data, rather than binary characters as well, which is not particularly useful here.

Thanks though - I'll investigate my options.

Alexandros Stamatakis

unread,
Nov 29, 2016, 9:05:20 AM11/29/16
to ra...@googlegroups.com
it does support binary data, it's just not very well documented, but the
on-line help (-h option) will show you the options for this.

ascertainment bias correction, more model flexibility, and the scalable
ExaML parallelization approach will all become available within the next
1-2 months, with the alpha release of a completely re-written raxml
version. We will announce it via this list.

alexis
> <https://groups.google.com/d/optout>.
>
> --
> Alexandros (Alexis) Stamatakis
>
> Research Group Leader, Heidelberg Institute for Theoretical Studies
> Full Professor, Dept. of Informatics, Karlsruhe Institute of
> Technology
> Adjunct Professor, Dept. of Ecology and Evolutionary Biology,
> University
> of Arizona at Tucson
>
> www.exelixis-lab.org <http://www.exelixis-lab.org>

Thomas Halliday

unread,
Nov 29, 2016, 9:38:47 AM11/29/16
to raxml
Oh, fantastic, on all of those counts! I look forward to using it!

Alexey Kozlov

unread,
Nov 29, 2016, 3:53:28 PM11/29/16
to ra...@googlegroups.com
Hi Thomas,

just in case you still want to investigate this further with the old version, that's what I would check:

- you'd need a lot of memory for this dataset (>256GB), so please double-check that you have enough
- try to disable asc bias correction
- try to run unconstrained search (without -g)
- try to use linked branch length (without -M)
- try to remove binary partition

The output you've posted is very strange, since

Partition: 0 with name: p1

is printed *after* base freqs are computed for *all* partitions.

Best,
Alexey
> > www.exelixis-lab.org <http://www.exelixis-lab.org> <http://www.exelixis-lab.org>
> >
> > --
> > You received this message because you are subscribed to the Google
> > Groups "raxml" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> > an email to raxml+un...@googlegroups.com <javascript:>
> > <mailto:raxml+un...@googlegroups.com <javascript:>>.
Reply all
Reply to author
Forward
0 new messages