bootstrap supports are 100% in all nodes for long phylogenomic alignments

60 views
Skip to first unread message

Dieunel Derilus

unread,
Dec 15, 2018, 9:53:02 PM12/15/18
to IQ-TREE
Dear Iqtree developers an users


I am running a iqtree  for a long phylogenomic alignment. I get my alignment by concatenating the core orthogroups resulted from orthofinder output. I have a total of 16 nodes  and all are 100%  supported. However If  make a tree with genes from  the same orthogroups(short alignment) , the bootstrap supports vary so much. my problem  is that , I never see  yet in any paper were all nodes are 100% supported. Please do you think that this tree could confidently used  and reported ?  my analysis , is it wrong ? 

Note that  I have the same results with Iqtree,raxml and Mrbayes.

Thanks  for your support.

Regards

Minh Bui

unread,
Dec 16, 2018, 7:46:00 PM12/16/18
to IQ-TREE, Dieunel Derilus
Dear Dieunel,

In fact 100% BS values are not uncommon for phylogenomic alignments and that’s exactly why we suggested to look at concordance factors. Please have a a look at an excellent blog post by Rob Lanfear: http://www.robertlanfear.com/blog/files/concordance_factors.html

Cheers
Minh

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

Dieunel Derilus

unread,
Dec 17, 2018, 11:21:37 AM12/17/18
to Minh Bui, IQ-TREE
Dear Minh
Thank you very much for your reply and for sharing this  blog post that is very useful for me. In this case when get 100% BS values for all nodes do you think that I should report the   gCF and sCF  as presented in the tree is in this post ?

Cordially
--------------------------
Derilus Dieunel
Graduate Student
Department of Environmental Sciences
University of Puerto Rico, Río Piedras



Brian Foley

unread,
Dec 17, 2018, 4:43:20 PM12/17/18
to IQ-TREE
My response would be that it depends a lot on what type of organism you are analyzing.  The genomes of mammals are quite a bit different than the genomes of bacteria, for example.  And analyzing the genomes of two Asian humans, two South African native humans and two South American native humans would give different types of results, than analyzing the genomes of one human 2 chimpanzees, 2 gorillas and one of each of a a few other great apes and non-human primates.

For some organisms, humans have spent a great deal of effort defining what a "species" is, and form some organisms the species boundaries are clear-cut for good biological reasons. But for many other organisms, there is no clear boundary between "species" and/or there are biological reasons why there is copious gene flow between fairly well-defined groups where the concept of "species" does make sense. 

If 95 to 99% of the human genome is clearly closer to chimpanzee than to gorilla then bootstrapping should give 100% support for ((human, chimp)gorilla) because there are genes or regions of the genome where human is closer to gorilla, more than random points evenly distributed where human is closer to gorilla.  Bootstrapping randomly samples from the alignment.  Building gene trees, or sampling different types of data (mitochondria, MHC alleles, etc) is not the same as bootstrapping.

For mammals and many other types of eukaryotes there are a few different reasons for gene trees being different than species trees. Some is due to incomplete lineage sorting, some is due to introgression, some is due to horizontal gene transfer, some is due to very odd selection pressures.  For bacteria, most of the problem is said to be from horizontal gene transfer by phages, plasmids, and random DNA uptake.

For many organisms, the difference between 100% bootstrap support and less bootstrap support is due to sampling. For example, if we leave gorillas out of the human, chimpanzee, other primate analyses, we get 100% support for human/chimpanzee.  The same would be true if the gorillas had gone extinct, or if gorillas had split from the (gorilla/chimpanzee/human) common ancestor ten or twenty million years earlier than they did.  For many organisms, we have very bised samplling.  For example we have a lot of E.coli sampled from human and domesticated animal sources and not so much from marine mammals or Tibetan shrews or other sources that humans are less interested in.  For viruses we tend to focus on human and domestic animal pathogens and to ignore the viruses that have little or no economic impact on humans. 

Brian Foley

unread,
Dec 17, 2018, 5:09:21 PM12/17/18
to IQ-TREE

Even within a small group like the mammals or the insects, different groups (families, genera, whatever) have different propensity for being genetically isolated or not.
Birds and bats and flying insects for example are less prone to being cut off by a geographic boundary such as a river or mountain range.  Solitary animals like cats have different mating habits than herd animals like deer.  All of these things influence how cleanly one group of organisms splits off from the rest of its tribe to found a new species.

On Monday, December 17, 2018 at 2:43:20 PM UTC-7, Brian Foley wrote:
My response would be that it depends a lot on what type of organism you are analyzing.  The genomes of mammals are quite a bit different than the genomes of bacteria, for example.  ..

Dieunel Derilus

unread,
Dec 17, 2018, 11:26:41 PM12/17/18
to IQ-TREE

Thanks for all these details. Then in my case, I am working with Photosynthetic Picoeukaryotes (PPE) species and some microalgae. This is polyphyletic groups I would not expect 100 % BS support for all nodes. I am also involved in another project of comparative genomics of apicomplexan (40 species) , we are getting the same  results, all nodes have 100 BS values.

 

Please do   know one or phylogenomic papers that present trees with all nodes supported at 100% ? this will be very useful for me. Note that I am a PhD student Environmental Sciences, I am new in this field and I am trying to understand my results in mores details.
--------------------------
Derilus Dieunel
Graduate Student
Department of Environmental Sciences
University of Puerto Rico, Río Piedras



--

Brian Foley

unread,
Dec 18, 2018, 4:35:30 AM12/18/18
to IQ-TREE

I do not know of a paper with trees showing 100% bootstrap support of all nodes, but I would not find it surprising to see such a paper if I looked for one.
I am not a big fan of bootstrap values, but they are useful in some cases.  I don't know anything about Photosynthetic Picoeukaryotes, but it seems they would include organisms such as diatoms which have been evolving for quite a long time and hence could have a lot of diversity. It also seems likely that most species or lineages would not go through many "genetic bottlenecks" with population sizes reduced to just a few individuals.  The wikipedia article on diatoms says they are diploid most of the time and form sperm and egg haploid cells sometimes which is a bit different than fungi typically which spend more of their time in the haploid state.  Wikipedia says there are more than 100,000 species of diatoms, which is not surprising for a group of organisms with an ancient history.  It is more like studying all bilateria, than like studying mammals or insects, each if which is a more recent "crown group" evolved out of bilateria.  If someone made a phylogenetic tree using the genomes of one bee, one ant, one butterfly, one turtle, one snake, and one mammal I would expect 100% bootstrap support for all nodes.  A tree that analyzes the complete genomes of mammals including 100 species of gazelles and antelopes, and 45 species of deer will have many nodes with low bootstrap support.  Within a recently evolved group of the bilateria such as the vertebrates, there are many clades such as tetrapods, and within the tetrapods there are clades such as the amphibians and the mammals which are each very clearly monophyletic with 100% bootstrap support. Most of this agrees with the fossil record and we can understand why the "tree" of vertebrate life has major branches.  I am guessing that organisms like diatoms are more difficult to distinguish as major branches based on fossils, so it is more difficult to know what to expect their tree to look like.

Dieunel Derilus

unread,
Dec 18, 2018, 2:55:44 PM12/18/18
to IQ-TREE
Thank you for this detailed explanation. Yu are right, the micro algae and Phoptosynthetique picoeukaryotes are polyphyletic groups. Then,  high BS values output  for phylogenomic  tree  could be discussed.

Thanks again for  your assistance, this was very appreciated

Cordially

--------------------------
Derilus Dieunel
Graduate Student
Department of Environmental Sciences
University of Puerto Rico, Río Piedras



On Tue, Dec 18, 2018 at 5:35 AM Brian Foley <brianf...@gmail.com> wrote:

I do not know of a paper with trees showing 100% bootstrap support of all nodes, but I would not find it surprising to see such a paper if I looked for one.
I am not a big fan of bootstrap values, but they are useful in some cases.  I don't know anything about Photosynthetic Picoeukaryotes, but it seems they would include organisms such as diatoms which have been evolving for quite a long time and hence could have a lot of diversity. It also seems likely that most species or lineages would not go through many "genetic bottlenecks" with population sizes reduced to just a few individuals.  The wikipedia article on diatoms says they are diploid most of the time and form sperm and egg haploid cells sometimes which is a bit different than fungi typically which spend more of their time in the haploid state.  Wikipedia says there are more than 100,000 species of diatoms, which is not surprising for a group of organisms with an ancient history.  It is more like studying all bilateria, than like studying mammals or insects, each if which is a more recent "crown group" evolved out of bilateria.  If someone made a phylogenetic tree using the genomes of one bee, one ant, one butterfly, one turtle, one snake, and one mammal I would expect 100% bootstrap support for all nodes.  A tree that analyzes the complete genomes of mammals including 100 species of gazelles and antelopes, and 45 species of deer will have many nodes with low bootstrap support.  Within a recently evolved group of the bilateria such as the vertebrates, there are many clades such as tetrapods, and within the tetrapods there are clades such as the amphibians and the mammals which are each very clearly monophyletic with 100% bootstrap support. Most of this agrees with the fossil record and we can understand why the "tree" of vertebrate life has major branches.  I am guessing that organisms like diatoms are more difficult to distinguish as major branches based on fossils, so it is more difficult to know what to expect their tree to look like.

--
Reply all
Reply to author
Forward
0 new messages