RAxML-NG on UCE data queries

114 views
Skip to first unread message

Justin Bernstein

unread,
Feb 22, 2021, 10:27:03 AM2/22/21
to raxml
Hello,

I made a post recently about using RAxML v8 on a UCE dataset, and have switched over to RAxML-NG as recommended. The alignment.phylip was created using the phyluce pipeline, and I have just a few inquiries on how to proceed with analyzing these data. I am also running this code for a 75% completeness alignment as well. 


1. I have a 90% completeness matrix that has the following patterns:
Alignment sites / patterns: 1878880 / 350889
Gaps: 11.50 %
Invariant sites: 89.28 %

I am not sure why the invariant sites is so high, as when I ran this using RAxML v8 (which timed out on the cluser), it was ~ 10-15%. Nonetheless, I am running the following code:

raxml-ng --all --msa mafft-nexus-edge-trimmed-clean-most-90p.phylip --model GTR+G --prefix Stegonotus-90 --seed 2 --threads 2 --bs-metric fbp,tbe

My question for this is: currently it is running for 2 days and has gone through 95 bootstrap iterations. I assume it will time out at the 72 hour time limit. What is the best way to go about my analysis? Should I just check for convergence, and if it hasn't, run it again using the checkpoint? Is there a point at which I should just stop running it all together? Additionally, the following files were created:
Alignment-90.raxml.bootstraps.TMP
Alignment-90.raxml.ckp
Alignment-90.raxml.lastTree.TMP
Alignment-90.raxml.log
Alignment-90.raxml.mlTrees.TMP
Alignment-90.raxml.rba
Alignment-90.raxml.reduced.phy
Alignment-90.raxml.startTree

Where are the files that are for the Alignment-90.supportFBP and Alignment-90.supportTBE? Do those get created only when there is convergence? 

If there is a better way to go about my analyses, please let me know (i.e., running commands separately rather than the --all' flag). These datasets are not partitioned, so if I should change that, any recommendations on how to go about that would be appreciated. This is my first time running a genomic dataset in RAxML. 

Best,
Justin 

Alexey Kozlov

unread,
Feb 22, 2021, 7:40:26 PM2/22/21
to ra...@googlegroups.com
Hello Justin,

> 1. I have a 90% completeness matrix that has the following patterns:
> Alignment sites / patterns: 1878880 / 350889
> Gaps: 11.50 %
> Invariant sites: 89.28 %
>
> I am not sure why the invariant sites is so high, as when I ran this using RAxML v8 (which timed out
> on the cluser), it was ~ 10-15%.

IIRC, RAxML v8 does not report prop. of invariant sites, so you might be comparing two different
values.

>Nonetheless, I am running the following code:
>
> raxml-ng --all --msa mafft-nexus-edge-trimmed-clean-most-90p.phylip --model GTR+G --prefix
> Stegonotus-90 --seed 2 --threads 2 --bs-metric fbp,tbe

Maybe consider using more threads/cores since your dataset is pretty large.

> My question for this is: currently it is running for 2 days and has gone through 95 bootstrap
> iterations. I assume it will time out at the 72 hour time limit. What is the best way to go about my
> analysis? Should I just check for convergence, and if it hasn't, run it again using the checkpoint?

Yes, that's what I would do.

> Is there a point at which I should just stop running it all together?

You can also manually limit number of bootstrap repplicates, e.g. --bs-trees 100.

> Additionally, the following
> files were created:
> Alignment-90.raxml.bootstraps.TMP
> Alignment-90.raxml.ckp
> Alignment-90.raxml.lastTree.TMP
> Alignment-90.raxml.log
> Alignment-90.raxml.mlTrees.TMP
> Alignment-90.raxml.rba
> Alignment-90.raxml.reduced.phy
> Alignment-90.raxml.startTree
>
> Where are the files that are for the Alignment-90.supportFBP and Alignment-90.supportTBE? Do those
> get created only when there is convergence?

Yes, those files are created when bootstrapping has converged or reached the maximum number of
replicates (see above).

> If there is a better way to go about my analyses, please let me know (i.e., running commands
> separately rather than the --all' flag).

This is just a question of convenience vs. feasibility, i.e. if you can run '--all' within
reasonable timeframe (which seems to be the case), then it is the best option.

>These datasets are not partitioned, so if I should change
> that, any recommendations on how to go about that would be appreciated. This is my first time
> running a genomic dataset in RAxML.

Maybe check other UCE-based phylogenetic studies to get an idea whether partitioning is considered
useful for this type of data. Also, given high proportion of invariant sites, you might want to try
the "GTR+G+I" model.

Best,
Alexey

Justin Bernstein

unread,
Feb 22, 2021, 10:36:53 PM2/22/21
to raxml
Hi Alexey,

Thanks for all of the advice! I understand everything. The only thing I want to clarify is that if I have 16 cores available to use, I can just set the --threads flat to --threads 16, correct? Also, if I decide to stop the first command I gave (due to it taking a while), is there a way to turn those bootstraps.TMP files into the final .supportFBP and .supportTBE files? I can't imagine they will turn into the aforementioned finals when the cluster stops running or if I cancel the job script (unless this is just a file renaming situation and the TMP files are the FBP and TBE files. I hope that makes sense! 

Best,
Justin

Alexey Kozlov

unread,
Feb 23, 2021, 7:12:24 AM2/23/21
to ra...@googlegroups.com
Hi Justin,

> Thanks for all of the advice! I understand everything. The only thing I want to clarify is that if I
> have 16 cores available to use, I can just set the --threads flat to --threads 16, correct?

Exactly.

>Also, if
> I decide to stop the first command I gave (due to it taking a while), is there a way to turn those
> bootstraps.TMP files into the final .supportFBP and .supportTBE files?

Technically, you can use lastTree.TMP and bootstraps.TMP files with "--support" command.

But I'd rather recommend to restart from a checkpoint with a reduced number of bootstraps to get ML
tree and initial supports. Then, if needed, you can add more bootstrap replicates as described here:

https://github.com/amkozlov/raxml-ng/wiki/Tutorial#bootstrapping

Best,
Alexey

I can't imagine they will
> turn into the aforementioned finals when the cluster stops running or if I cancel the job script
> (unless this is just a file renaming situation and the TMP files /are/ the FBP and TBE files. I hope
> --
> You received this message because you are subscribed to the Google Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> raxml+un...@googlegroups.com <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/70507231-5b9e-471e-befb-a84f3a7a1381n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/70507231-5b9e-471e-befb-a84f3a7a1381n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages