What is the implication of restricting model selection in order to increase speed (i.e., using RaxML)?

omotoso olatunde

unread,

Jan 17, 2021, 4:14:13 PM1/17/21

to PartitionFinder

Hi All,

I am using PartitionFinder2 on my WSL (Ubuntu ). I have a very large alignment (i concat whole 1-to-1 CDS from 18 taxa) of about 25 million string in length. my major concern is the “model selection” and “model search". The protocol I follow for my work used HKY+G+I for ML of phylogenetic analysis, however, HKY model is not included in the RaxML, which means I can not use rcluster to speed up PF2 runs. I know "greedy" search is slow but its the only option I have if I would test both HKY+ and GTR+ models, I used it anyways and now I am stuck at a point for a week. I think I might have acted greedily by using greedy search with many models for a large datasets like this. Please what do you advice and how do you think I can go around this? do you think using RaxML (which means, not testing HKY+) would affect the quality of my phylogenetic analysis?

Here is the display since Jan 3:

INFO | 2021-01-03 12:35:11,371 | NumExpr defaulting to 6 threads.

INFO | 2021-01-03 12:35:37,433 | ------------- PartitionFinder 2.1.1 -----------------

INFO | 2021-01-03 12:35:37,433 | You have Python version 2.7

INFO | 2021-01-03 12:35:37,434 | Command-line arguments used: /partitionfinder/PartitionFinder.py /mnt/c/Desktop/final_edit/

INFO | 2021-01-03 12:35:37,434 | ------------- Configuring Parameters -------------

INFO | 2021-01-03 12:35:37,435 | Setting datatype to 'DNA'

INFO | 2021-01-03 12:35:37,435 | Setting phylogeny program to 'phyml'

INFO | 2021-01-03 12:35:37,435 | Program path is here

INFO | 2021-01-03 12:35:37,436 | Setting working folder to: '/mnt/c/Users/dell/Desktop/copam/final_edit'

INFO | 2021-01-03 12:35:37,437 | Loading configuration at '…/partition_finder.cfg'

INFO | 2021-01-03 12:35:37,493 | Setting 'alignment' to 'allseqs.phy'

INFO | 2021-01-03 12:35:37,494 | Setting 'branchlengths' to 'linked'

INFO | 2021-01-03 12:35:37,496 | You set 'models' to: HKY+G, GTR+G, HKY+G+X, GTR+G+X, GTR+I+G, HKY+I+G, GTR+I+G+X, HKY+I+G+X, HKY+I+X, GTR+I+X

INFO | 2021-01-03 12:35:37,906 | This analysis will use the following 10 models of molecular evolution

INFO | 2021-01-03 12:35:37,906 | HKY+G, GTR+G, HKY+G+X, GTR+G+X, GTR+I+G, HKY+I+G, GTR+I+G+X, HKY+I+G+X, HKY+I+X, GTR+I+X

INFO | 2021-01-03 12:35:37,907 | Setting 'model_selection' to 'aicc'

INFO | 2021-01-03 12:35:50,490 | Setting 'search' to 'greedy'

INFO | 2021-01-03 12:35:50,492 | ------------------------ BEGINNING NEW RUN -------------------------------

INFO | 2021-01-03 12:35:50,493 | Looking for alignment file './allseqs.phy'...

INFO | 2021-01-03 12:35:50,494 | Using 6 cpus

INFO | 2021-01-03 12:35:50,494 | Beginning Analysis

INFO | 2021-01-03 12:35:54,345 | Reading alignment file './allseqs.phy'

INFO | 2021-01-03 12:36:21,757 | Starting tree will be estimated from the data.

INFO | 2021-01-03 12:36:37,461 | Estimating Maximum Likelihood tree with RAxML fast experimental tree search for ./analysis/start_tree/filtered_source.phy

INFO | 2021-01-03 12:36:38,441 | Using a separate GTR+G model for each data block

Rob Lanfear

unread,

Jan 17, 2021, 4:24:14 PM1/17/21

to PartitionFinder

Hi There,

I think you should use RAxML - there are a few published papers to suggest that sticking with the GTR models is very very unlikely to affect your analysis.

You could also look at using ModelFinder in IQTREE 2, which will allow you to do a rcluster (and rclusterf) search with the HKY (and many other) models.

Rob

omotoso olatunde

unread,

Jan 20, 2021, 8:52:26 AM1/20/21

to PartitionFinder

Thank you very much sir,

I have topped and adjusted the .cfg file to run only GTR models. I also transferred it to the server. I will keep you posted as the runs progresses

Reply all

Reply to author

Forward