Dear Flavia,
Indeed IQ-TREE reduces identical sequences to only two remaining.
It does it basically to save running time and adds them at the end of the analysis, because identical sequences will end up in the same subtree anyways.
Why are two kept? The reason is that this way IQ-TREE is able to infer support values for the branch to their subtree. This support might not be 1.0 or 100% for example in the case that there are sequences are almost identical but have some wildcards (e.g. N, Y, R… for DNA or X in proteins) or gaps.
However, by adding the option “-keep-ident” to the command-line you can tell IQ-TREE to keep all sequences of the alignment and to not reduce identical ones. This will typically increase the runtime of the analysis, but sometimes this is not avoidable ;)
I hope that helps you.
Best wishes,
Heiko
> --
> You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
iqtree+un...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/iqtree/271c5918-07d2-4e30-a0b4-9d7e1d5be918o%40googlegroups.com.
-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max Perutz Labs
http://www.cibiv.at/
-----------------------------------------------------------------------------