I'm loving IQ-Tree and would love to keep using it. I often have large
RADseq datasets with many individuals, and lots of missing data. Unfortunately, these large datasets often take a very long time to run through IQ-Tree on my machine (e.g. one week to evaluate one model!).
I know I can speed up analysis by just evaluating GTR models. I've asked previously about removing constant sites (which was not advised).
Are there other ways I can speed up the analysis? Removing some individuals? Removing individuals with high amounts of missing data?
Alternatively, are there other programs that might run faster with such large and unruly datasets? RAxML is another popular program, but takes similarly long to run (without the helpful log files and information of IQ-Tree!).
Any suggestions you have would be much appreciated.
iqtree2 -s 88clust.phy -mset GTR -mrate I+G,I+R
IQ-TREE multicore version 2.1.2 COVID-edition for Linux 64-bit built Mar 30 2021
Developed by Bui Quang Minh, James Barbetti, Nguyen Lam Tung,
Olga Chernomor, Heiko Schmidt, Dominik Schrempf, Michael Woodhams.
Host: tbc-comp1 (SSE4.2, 125 GB RAM)
Command: iqtree2 -s 88clust.phy -mset GTR -mrate I+G,I+R
Seed: 676885 (Using SPRNG - Scalable Parallel Random Number Generator)
Time: Mon Sep 13 14:11:39 2021
Kernel: SSE2 - 1 threads (80 CPU cores detected)
HINT: Use -nt option to specify number of threads because your CPU has 80 cores!
HINT: -nt AUTO will automatically determine the best number of threads to use.
Reading alignment file 88clust.phy ... Phylip format detected
Alignment most likely contains DNA/RNA sequences
WARNING: 271837 sites contain only gaps or ambiguous characters.
Alignment has 207 sequences with 31735669 columns, 2344971 distinct patterns
143311 parsimony-informative, 92902 singleton sites, 31499456 constant sites