ERROR: Tree taxa and alignment sequence do not match - LSD2

408 views
Skip to first unread message

Flávia Pezzini

unread,
Jun 17, 2020, 12:54:35 PM6/17/20
to IQ-TREE
Hello, 

I am trying to use the new LSD2 dating method using the command: 

./iqtree2 -s myalignment.fasta --date date_iqtree -te myphylogeny_rooted.tre -T AUTO

However, I am running into the error Tree taxa and alignment sequence do not match. After checks and re-checks of labels names with grep and renaming my alignment labels with my tree labels using pxrls (Taxon relabelling for sequences from phyx), I realised that IQ-TREE might be complaining about the taxa that have identical sequences in my alignment and ignored. For example, when parsing the alignment, this is printed:

NOTE: Chamaecrista_potentilla_Rando_1139_KX264356 (identical to Chamaecrista_lagotois_Rando_1029_KX264358) is ignored but added at the end

And later IQ-TREE prints:

ERROR: Tree taxon Chamaecrista_potentilla_Rando_1139_KX264356 does not appear in the alignment

Is there a way around it? My phylogeny has almost 4,000 tips and it would be much faster to date it if I could include the topology.

Thank you very much in advance,

Flávia

ps. IQ-TREE keeps some taxa identified as identical eg. NOTE: Zygia_juruana_Iganci_879_KX374534 is identical to Zygia_cataractae_Bonadeu_647_KX374531 but kept for subsequent analysis. Why some identical taxa is kept and other discarded?

Heiko Schmidt

unread,
Jun 17, 2020, 1:34:07 PM6/17/20
to IQ-TREE Forum
Dear Flavia,

Indeed IQ-TREE reduces identical sequences to only two remaining.
It does it basically to save running time and adds them at the end of the analysis, because identical sequences will end up in the same subtree anyways.

Why are two kept? The reason is that this way IQ-TREE is able to infer support values for the branch to their subtree. This support might not be 1.0 or 100% for example in the case that there are sequences are almost identical but have some wildcards (e.g. N, Y, R… for DNA or X in proteins) or gaps.

However, by adding the option “-keep-ident” to the command-line you can tell IQ-TREE to keep all sequences of the alignment and to not reduce identical ones. This will typically increase the runtime of the analysis, but sometimes this is not avoidable ;)

I hope that helps you.

Best wishes,
Heiko
> --
> You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/iqtree/271c5918-07d2-4e30-a0b4-9d7e1d5be918o%40googlegroups.com.

-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max Perutz Labs
http://www.cibiv.at/
-----------------------------------------------------------------------------

Flávia Pezzini

unread,
Jun 17, 2020, 3:17:42 PM6/17/20
to iqt...@googlegroups.com
Hi Heiko, 

Thank you! All working fine now.

With all best wishes,

Flávia



--
Flávia Fonseca Pezzini
University of Exeter | Royal Botanic Garden Edinburgh
20a Inverleith Row
Edinburgh, EH3 5LR, UK
Tel: +44 (0)131 248 2899 | +44 (0) 7721445316

f.pe...@rbge.org.uk | flavia...@gmail.com | skype: flaviapezzini1

Vamos preservar as Matas Secas tropicais da América Latina: veja a animação! | Help Latin American Dry Forests: see our film http://elmer.rbge.org.uk/dryflor/
Reply all
Reply to author
Forward
0 new messages