Dear Justin,
> I am using raxmlHPC-PTHREADS-SSE3 through the phyluce pipeline on a UCE
> dataset (~2300 UCEs, 102 taxa) to obtain a ML tree. I ran the following:
> raxmlHPC-PTHREADS-SSE3 \
> -m GTRGAMMA \
> -N 20 \
> -p 6377 \
> -n best \
> -s UCEs-file.phylip \
> -T 16
>
> And following this I ran a bootstrap analysis:
> raxmlHPC-PTHREADS-SSE3 \
> -m GTRGAMMA \
> -N autoMRE \
> -p 19877 \
> -b 3435 \
> -n bootreps \
> -s UCEs-file.phylip \
> -T 12
Please consider switching to RAxML-NG as standard RAxML is no longer
supported.
https://github.com/amkozlov/raxml-ng
> Unfortunately, I accidentally canceled the job submission, but it was
> going to time out anyway because it got to bootstrap 62 after 60 hours
> (72 hour time limit on the computer cluster). So this would inevitably
> timed out. I ran this using 12 cores (16 is max) and 180 GB of memory.
> My questions are:
>
> 1. The first script took a long time to run, ~25 hours. Is this just
> because of the large amount of data I have (I made sure the number of
> cores I designated did not exceed what is physically available on the
> computer)?
I can't tell you as I'd need the exact overall alignment length, but
probably yes. Depending on the MSA length you could also use all 16
cores if the length exceeds 16,000 sites.
> 2. I am going to try to use all 16 cores for the bootstraps, but when a
> bootstrap analysis times out, is there a way to 'pick up where it left
> off?' I know I can technically designate the same seed, but am I correct
> in saying that this technically doesn't mean it is picking up where my
> analysis left off, it would be more starting from scratch, right?
Yes, you should just pick up using another seed.
> 3. Bouncing off my second question, am I able to concatenate the
> bootstrap.bootrepsfolder, or are multiple runs not comparable?
You can concatenate them, the inference of individual BS replicates is
completely independent, so you could also do 100 individual BS reps and
concatenate their results. Just make sure that you use different seeds.
Note that you can execute the bootstrap convergence tests also a
posteriori on the concatenated set of bootstrap trees.
> I guess the bottom line of my questioning is what to do when the run
> time is so long due to genomic data and I've reached the limit of run
> time on the computer cluster (does raxmlHPC-PTHREADS-SSE3 allow
> computing across multiple nodes?).
Yes, there's also a MPI version of it that will distribute the
computations across several nodes. However, here we'd really recommend
using RAxML-NG as it is easier to use under such a setting.
Alexis
>
> Any help is greatly appreciated.
>
> Best,
> Justin
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
raxml+un...@googlegroups.com
> <mailto:
raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/raxml/be1784f9-6251-4fc3-ae36-ba9ecf0f930cn%40googlegroups.com
> <
https://groups.google.com/d/msgid/raxml/be1784f9-6251-4fc3-ae36-ba9ecf0f930cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
www.exelixis-lab.org