Model selection on large genomic dataset

48 views
Skip to first unread message

kin onn chan

unread,
Nov 3, 2017, 10:03:17 AM11/3/17
to IQ-TREE
Dear IQ-TREE Users,
I'm trying to run iq-tree on a large concatenated alignment consisting of 14,000 exonic loci. I tried a preliminary run using a GTR+GAMMA model but the results were rather unsatisfactory. Does anyone have any recommendations on how to approach model selection for such a large dataset? Thank you in advance!
Chan

Bui Quang Minh

unread,
Nov 4, 2017, 2:36:30 PM11/4/17
to iqt...@googlegroups.com, kin onn chan
Hi Chan,

While I don’t know your expectation about what is satisfactory or not, my recommendation right now is: Because the exon can be quite short, the partition model is likely over-parameterized with too many short partitions. To avoid this, you can reduce the number of partitions by e.g. using ModelFinder or PartitionFinder to merge the partitions. You can also use genes as partitions (i.e. merging exons of the same gene) and repeat the analysis. Also you should definitely do model selection, not just do GTR+G. IQ-TREE provided many other (simpler) DNA models to avoid overfitting if possible.

Cheers, Minh

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

Reply all
Reply to author
Forward
0 new messages