Feature suggestion: More than one user-defined constraint tree

144 views
Skip to first unread message

steven.wad...@gmail.com

unread,
Jun 3, 2017, 9:30:53 AM6/3/17
to IQ-TREE
Hi Minh and the IQ-TREE team,

Thanks again for the great software! My research has been made both more efficient and more flexible due to your efforts. Bravo!

I have a suggestion for a future version of IQ-TREE which is being prompted by a project currently in progress.

Suppose we have several ingroup taxa… some of which are living animals represented by both DNA and morphology, while other ingroup taxa are fossils represented only by morphology. Also suppose that the outgroup taxa are similarly represented… some are living and some are fossils. Let’s also assume that the DNA data representing the living taxa is robust - maybe a 40 gene concatenated alignment with 50,000 positions. In a model and partition scheme test using IQ-TREE or PartitionFinder (or whatever software) we find it makes most sense to use 12 DNA partitions some of which use a GTR+G model and others a GTR+I+G model.

If we want to force all ingroup taxa to form a monophyletic group, we can easily do so by using the -g option. This can be done with a constraint tree like the example “monophyly.constraint.tree.nwk” attached below. To use this in IQ-TREE, I simply add “-g monophyly.constraint.tree.nwk” to the terminal call.

However, if the DNA data has been used in a previous study to infer the relationships among the living taxa (and the support values for those relationships are strong), a different approach could be made. Suppose we eliminate the DNA data from the current analysis altogether and instead use only morphology. We could create a “molecular scaffold” tree based on the previous study to lock down the relationships among living taxa and allow fossils to assume their most likely positions. If we are willing to assume that the inclusion of fossil taxa won’t change the relationships among living taxa, this approach has a major advantage… the software does not have to deal with GTR+G and GTR+I+G parameters across 12 different partitions and can therefore greatly reduce processing time. This can be done with a scaffold tree like the example “scaffold.constraint.tree.nwk” attached below. To use this in IQ-TREE, I simply add “-g scaffold.constraint.tree.nwk” to the terminal call.

But.... What if we want to use BOTH scaffold and monophyly constraints at the same time? This would be quite advantageous! The trick of course is that we need to provide scaffold and monophyly constraint trees that are fully congruent (like the two trees attached). This can be done in MrBayes using a combination of “hard” (monophyly) and “soft” (backbone) constraints. TNT also offers the option to combine monophyletic group trees and skeleton trees. Both MrBayes and TNT run a congruence test on the user-defined trees before processing the data and stop if the trees fail this test.

My suggestion for a future version of IQ-TREE would be to allow more than one constraint tree under the condition that they are congruent. Maybe the terminal call could be adapted to something like: “-g1 momophyly.constraint.tree.nwk -g2 scafold.constraint.tree.nwk”. One foreseeable issue would be to preserve the code of former IQ-TREE versions by setting -g and –g1 to be synonymous commands. As far as I know, RAxML does not allow more than one set of user-defined constraints. If IQ-TREE could incorporate this feature, it would add to the growing list of items that make your software more versatile and better adapted to deal with diverse data sets.

Thanks for hearing me out. Cheers!

-Steven


monophyly.constraint.tree.jpg
scaffold.constraint.tree.jpg

Bui Quang Minh

unread,
Jun 5, 2017, 9:24:03 AM6/5/17
to iqt...@googlegroups.com, steven.wad...@gmail.com
Dear Steven,

you are welcome.

For your example, since the two constraint trees are congruent, you can merge them into one tree. And input this merged constraint tree into IQ-TREE, right? 

I agree that accepting more than 1 constraint tree is still more convenient. However, right now we have a more pressing thing to do, that the constrained tree search is quite slow for taxon-rich data sets because of an inefficient data structure to handle constraints. Thus, we have to improve this first, before adding further feature.

Cheers, Minh

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.
<monophyly.constraint.tree.jpg><scaffold.constraint.tree.jpg>

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

Brendon Boudinot

unread,
Feb 21, 2020, 5:08:55 PM2/21/20
to IQ-TREE
This is almost exactly what I am in need of doing. I have a trustworthy molecular tree which I can use as a scaffold my > 100 extant taxa, and I have > 400 fossils.

The catch is that this requires the use of "soft" or "partial" constraints, which I have done extensively in MrBayes.

Minh, would you clarify for me: If I specify a scaffold tree only including extant taxa (i.e., I omit the fossils from the constraint tree), can the fossils "traverse" the constrained node (as in a "partial" constraint), or will they be excluded from the clades designated in the scaffold tree? In other words, does the -g newick tree specify "hard" constraint or "partial" constraint?

Thank you,
Brendon

Minh Bui

unread,
Feb 27, 2020, 4:22:01 AM2/27/20
to IQ-TREE, Brendon Boudinot
Hi Brendon,



On 21 Feb 2020, at 2:08 pm, Brendon Boudinot <boud...@gmail.com> wrote:

This is almost exactly what I am in need of doing. I have a trustworthy molecular tree which I can use as a scaffold my > 100 extant taxa, and I have > 400 fossils.

The catch is that this requires the use of "soft" or "partial" constraints, which I have done extensively in MrBayes.

Can you explain what is soft / partial constraints?


Minh, would you clarify for me: If I specify a scaffold tree only including extant taxa (i.e., I omit the fossils from the constraint tree), can the fossils "traverse" the constrained node (as in a "partial" constraint), or will they be excluded from the clades designated in the scaffold tree? In other words, does the -g newick tree specify "hard" constraint or "partial" constraint?

Any taxa not included in the constraint tree are free to move everywhere - the bottom line is that there is no constraint about them. Does that answer your question? (I don’t know what is partial constraint, to be honest).

Cheers
Minh


Thank you,
Brendon


On Saturday, June 3, 2017 at 6:30:53 AM UTC-7, steven.wa...@gmail.com wrote:
Hi Minh and the IQ-TREE team,

Thanks again for the great software! My research has been made both more efficient and more flexible due to your efforts. Bravo!

I have a suggestion for a future version of IQ-TREE which is being prompted by a project currently in progress.

Suppose we have several ingroup taxa… some of which are living animals represented by both DNA and morphology, while other ingroup taxa are fossils represented only by morphology. Also suppose that the outgroup taxa are similarly represented… some are living and some are fossils. Let’s also assume that the DNA data representing the living taxa is robust - maybe a 40 gene concatenated alignment with 50,000 positions. In a model and partition scheme test using IQ-TREE or PartitionFinder (or whatever software) we find it makes most sense to use 12 DNA partitions some of which use a GTR+G model and others a GTR+I+G model.

If we want to force all ingroup taxa to form a monophyletic group, we can easily do so by using the -g option. This can be done with a constraint tree like the example “monophyly.constraint.tree.nwk” attached below. To use this in IQ-TREE, I simply add “-g monophyly.constraint.tree.nwk” to the terminal call.

However, if the DNA data has been used in a previous study to infer the relationships among the living taxa (and the support values for those relationships are strong), a different approach could be made. Suppose we eliminate the DNA data from the current analysis altogether and instead use only morphology. We could create a “molecular scaffold” tree based on the previous study to lock down the relationships among living taxa and allow fossils to assume their most likely positions. If we are willing to assume that the inclusion of fossil taxa won’t change the relationships among living taxa, this approach has a major advantage… the software does not have to deal with GTR+G and GTR+I+G parameters across 12 different partitions and can therefore greatly reduce processing time. This can be done with a scaffold tree like the example “scaffold.constraint.tree.nwk” attached below. To use this in IQ-TREE, I simply add “-g scaffold.constraint.tree.nwk” to the terminal call.

But.... What if we want to use BOTH scaffold and monophyly constraints at the same time? This would be quite advantageous! The trick of course is that we need to provide scaffold and monophyly constraint trees that are fully congruent (like the two trees attached). This can be done in MrBayes using a combination of “hard” (monophyly) and “soft” (backbone) constraints. TNT also offers the option to combine monophyletic group trees and skeleton trees. Both MrBayes and TNT run a congruence test on the user-defined trees before processing the data and stop if the trees fail this test.

My suggestion for a future version of IQ-TREE would be to allow more than one constraint tree under the condition that they are congruent. Maybe the terminal call could be adapted to something like: “-g1 momophyly.constraint.tree.nwk -g2 scafold.constraint.tree.nwk”. One foreseeable issue would be to preserve the code of former IQ-TREE versions by setting -g and –g1 to be synonymous commands. As far as I know, RAxML does not allow more than one set of user-defined constraints. If IQ-TREE could incorporate this feature, it would add to the growing list of items that make your software more versatile and better adapted to deal with diverse data sets.

Thanks for hearing me out. Cheers!

-Steven



-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.

Brendon Boudinot

unread,
Nov 26, 2020, 7:51:28 AM11/26/20
to IQ-TREE
Dear Minh,

My apologies for the very late reply! For some reason I did not receive a notice that this question had been answered; the project I was working on is done, but I have a similar application now. Anyway, the constraint system employed by IQTree is exactly what I needed! Any taxa not in the constraint list (in my case, fossils) can move about freely in the tree, which is termed a "soft" constraint in MrBayes. Another question which is more pertinent to hypothesis testing with molecular data: Can IQTree implement "hard" constraints, i.e., constraining a set of taxa and completely preventing other taxa from nesting in that set?

Thank you so much! I am quite relieved and I will be proceeding soon.
Cheers!
Brendon

Reply all
Reply to author
Forward
0 new messages