You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raxml
Hello. I understand that, when dealing with duplicate sequences in an alignment, Raxlm-ng removes fully identical sequences and generates a new alignment file to estimate the tree. I am wondering how those sequences are subsequently put back in the tree. More precisely, I am trying to figure out If manual remove of my 108 duplicate sequences from my alignment and running my raxml analysis again could be helpful in getting a "better" tree or if I would just be repeating what raxml did for me already? Thank you for your help.
Alexandros Stamatakis
unread,
Mar 8, 2021, 10:53:28 PM3/8/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ra...@googlegroups.com
> Hello. I understand that, when dealing with duplicate sequences in an
> alignment, Raxlm-ng removes fully identical sequences and generates a
> new alignment file to estimate the tree. I am wondering how those
> sequences are subsequently put back in the tree.
The duplicate sequences are not put back in the tree as far as I know,
at least that's how it worked in standard RAxML.
> More precisely, I am
> trying to figure out If manual remove of my 108 duplicate sequences from
> my alignment and running my raxml analysis again could be helpful in
> getting a "better" tree or if I would just be repeating what raxml did
> for me already? Thank you for your help.
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raxml
Dear Alexis, thank you for responding. Looking at my raxml-ng trees, the duplicate sequences do appear back in the tree, but I think they're just polytomies. Thanks for the confirmation that the manual removal would not make a difference.
Alexey Kozlov
unread,
Mar 9, 2021, 12:43:19 PM3/9/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ra...@googlegroups.com
Hello Sonia,
just to clarify:
As of now (v1.0.2 and before), raxml-ng will report duplicate sequences and will create a reduced
deduplicated alignment for you - so no reason to do it manually. However, raxml-ng will perform tree
search on the original, full MSA *with* duplicate sequences.
This behavior might change in future versions. But for now, I can recommend to run --check or
--parse first (which is a good thing anyways), and then use generated reduced alignment for the
actual tree search. If alignment contains many duplicates, this will substantially reduce tree
search time.
Best,
Alexey
Dylan Padilla
unread,
Oct 30, 2023, 2:30:38 PM10/30/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raxml
Hello Alexey,
I actually did what you recommended. That is to say, once RAxML warned me about the identical sequences and generated the reduced alignment, I stopped the analysis and ran it again using the reduced .fasta file (I converted this .reduced file into a .fasta file). However, the latter alignment seemed messy, I was trying to check it out on Jalview and it would not even open (please see attached a screen shot of the file content). The analyses actually finished and I got the best ML tree with no apparent errors, but I am still doubting. Should I trust the results given how "messy" the reduced alignment looks like? or should I manually delete the identical sequences from the original .fasta file and then use that for building the tree?
Thanks in advance for any help. Looking forward to hearing from you!
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raxml
Hi Dylan
the reduced alignment only has the duplicates removed, RAxML doesn't change it.
The messy appearance is because RAxML generates a pure-code extended phylip format not fasta. Jalview may hence have used the wrong interpreter.
Phylip format's top line gives matrix dimensions,
followed by each entry in the order <sequence name>space<sequence> (amino acid in your case) in sequential formatting.
So, all is totally fine.
Good inferencing, Guido
Grimm
unread,
Oct 31, 2023, 8:22:09 AM10/31/23
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to raxml
PS Note that the inferred tree will naturally not include the removed duplicate tips. When preparing it for visualisation/publication, and you want to have them listed, you will have to re-add the name of the duplicates to the fitting tip. The duplicate sets are included in run-log file.