Reduced alignments in Raxml-ng

SoniaN

unread,

Mar 8, 2021, 11:13:18 AM3/8/21

to raxml

Hello. I understand that, when dealing with duplicate sequences in an alignment, Raxlm-ng removes fully identical sequences and generates a new alignment file to estimate the tree. I am wondering how those sequences are subsequently put back in the tree. More precisely, I am trying to figure out If manual remove of my 108 duplicate sequences from my alignment and running my raxml analysis again could be helpful in getting a "better" tree or if I would just be repeating what raxml did for me already? Thank you for your help.

Alexandros Stamatakis

unread,

Mar 8, 2021, 10:53:28 PM3/8/21

to ra...@googlegroups.com

> Hello. I understand that, when dealing with duplicate sequences in an
> alignment, Raxlm-ng removes fully identical sequences and generates a
> new alignment file to estimate the tree. I am wondering how those
> sequences are subsequently put back in the tree.

The duplicate sequences are not put back in the tree as far as I know,
at least that's how it worked in standard RAxML.

> More precisely, I am
> trying to figure out If manual remove of my 108 duplicate sequences from
> my alignment and running my raxml analysis again could be helpful in
> getting a "better" tree or if I would just be repeating what raxml did
> for me already? Thank you for your help.

You'd be repeating what raxml does.

Alexis

>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/8b54e234-4f62-4a9e-b178-c0e8b125a53bn%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/8b54e234-4f62-4a9e-b178-c0e8b125a53bn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org

SoniaN

unread,

Mar 9, 2021, 10:08:52 AM3/9/21

to raxml

Dear Alexis, thank you for responding. Looking at my raxml-ng trees, the duplicate sequences do appear back in the tree, but I think they're just polytomies. Thanks for the confirmation that the manual removal would not make a difference.

Alexey Kozlov

unread,

Mar 9, 2021, 12:43:19 PM3/9/21

to ra...@googlegroups.com

Hello Sonia,

just to clarify:

As of now (v1.0.2 and before), raxml-ng will report duplicate sequences and will create a reduced
deduplicated alignment for you - so no reason to do it manually. However, raxml-ng will perform tree
search on the original, full MSA *with* duplicate sequences.

This behavior might change in future versions. But for now, I can recommend to run --check or
--parse first (which is a good thing anyways), and then use generated reduced alignment for the
actual tree search. If alignment contains many duplicates, this will substantially reduce tree
search time.

Best,
Alexey

Dylan Padilla

unread,

Oct 30, 2023, 2:30:38 PM10/30/23

to raxml

Hello Alexey,

I actually did what you recommended. That is to say, once RAxML warned me about the identical sequences and generated the reduced alignment, I stopped the analysis and ran it again using the reduced .fasta file (I converted this .reduced file into a .fasta file). However, the latter alignment seemed messy, I was trying to check it out on Jalview and it would not even open (please see attached a screen shot of the file content). The analyses actually finished and I got the best ML tree with no apparent errors, but I am still doubting. Should I trust the results given how "messy" the reduced alignment looks like? or should I manually delete the identical sequences from the original .fasta file and then use that for building the tree?

Thanks in advance for any help. Looking forward to hearing from you!

Best,

Dylan.

Screen Shot 2023-10-30 at 11.22.17 AM.png

Grimm

unread,

Oct 31, 2023, 8:18:46 AM10/31/23

to raxml

Hi Dylan

the reduced alignment only has the duplicates removed, RAxML doesn't change it.

The messy appearance is because RAxML generates a pure-code extended phylip format not fasta. Jalview may hence have used the wrong interpreter.

Phylip format's top line gives matrix dimensions,

followed by each entry in the order <sequence name>space<sequence> (amino acid in your case) in sequential formatting.

So, all is totally fine.

Good inferencing, Guido

Grimm

unread,

Oct 31, 2023, 8:22:09 AM10/31/23

to raxml

PS Note that the inferred tree will naturally not include the removed duplicate tips. When preparing it for visualisation/publication, and you want to have them listed, you will have to re-add the name of the duplicates to the fitting tip. The duplicate sets are included in run-log file.

Reply all

Reply to author

Forward