usage of PMSF

329 views
Skip to first unread message

Xabier Vázquez-Campos

unread,
Jun 28, 2017, 3:41:06 AM6/28/17
to IQ-TREE
Hi,

I'm trying to use the PMSF model but I'm having problems.

I already have a base tree that I got from a previous run as
iqtree-omp -st AA -bb 1000 -wbtl -alrt 1000 -s concat.archaea.209plus.2017-06-26.faa -m TEST -mset LG -nt 4
(best model LG+I+G4)

So, the first question is... how do you perform the best model search for the PMSF?
Do I still need to run MF as usual to get the best mixture model combination?
The best model search for mixture models is really long (few hours just for the 8 base combinations of C60 with -m TESTONLY, alignment has 251 sequences with 4200 columns). But if it's the way to go, I'll do this with

I was trying to
iqtree-omp -st AA -s concat.archaea.209plus.2017-06-26.faa -nt 4 -pre test_models_PMSF -ft concat.archaea.209plus.2017-06-26.faa.treefile

with -m MF or -m TESTONLY but it reports
ERROR: Invalid model name TESTONLY
 
so I guess that it doesn't expect having to do any model search when -ft is provided...

Thank you in advance,

Xabi

Bui Quang Minh

unread,
Jun 28, 2017, 6:47:37 AM6/28/17
to IQ-TREE, Xabier Vázquez-Campos
Dear Xabi,

On Jun 28, 2017, at 9:41 AM, Xabier Vázquez-Campos <xvaz...@gmail.com> wrote:

Hi,

I'm trying to use the PMSF model but I'm having problems.

I already have a base tree that I got from a previous run as
iqtree-omp -st AA -bb 1000 -wbtl -alrt 1000 -s concat.archaea.209plus.2017-06-26.faa -m TEST -mset LG -nt 4
(best model LG+I+G4)

So, the first question is... how do you perform the best model search for the PMSF?
Do I still need to run MF as usual to get the best mixture model combination?

PMSF is an approximation to mixture models C60 and other Cxx models. Thus, you need to first do model selection incl. these mixture models via option: -madd LG+C60. You can also specify other models via comma-separated string, e.g.: -madd LG+C20,LG+C30,LG+C40,LG+C50,LG+C60.

Once this finishes, assuming that LG+C60 fits best. Then you can use -m LG+C60 to the command you pasted below.

The best model search for mixture models is really long (few hours just for the 8 base combinations of C60 with -m TESTONLY, alignment has 251 sequences with 4200 columns). But if it's the way to go, I'll do this with

I was trying to
iqtree-omp -st AA -s concat.archaea.209plus.2017-06-26.faa -nt 4 -pre test_models_PMSF -ft concat.archaea.209plus.2017-06-26.faa.treefile

with -m MF or -m TESTONLY but it reports
ERROR: Invalid model name TESTONLY
 
so I guess that it doesn't expect having to do any model search when -ft is provided…

right, see the above explanation.

Cheers, Minh


Thank you in advance,

Xabi

--
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

taua...@gmail.com

unread,
Sep 22, 2017, 10:54:38 AM9/22/17
to IQ-TREE
Hello again, Minh!

I am trying to run a concatenated analysis on a large protein dataset doing partitioning and model test all in the same run. In addition to standard protein models, I would like to also consider some mixture models in the search, LG4M, LG4X and the CAT-like models.

1) How do I add the CAT-like models in -mrate? Do I need to list all option C10,C20 etc or can I use something like CAT or just C to indicate a search through all possibilities?

2) Also, if I understood correctly, to account for heterotachy, I should use -sp instead of -spp, but then can I still submit the partition file or not? (Another related question is about what the heterotachy option is actually doing: I learned heterotachy as within-site variation through time, but in the manual it seems to be variation across taxa and sites…)

Simplified example:
iqtree -st AA -m MFP+MERGE -madd LG4M,LG4X -mrate G,R -msub nuclear -s input.nex -spp partition_file


Thank you for your help!

Bui Quang Minh

unread,
Sep 26, 2017, 6:21:05 AM9/26/17
to iqt...@googlegroups.com, taua...@gmail.com
Dear Tauana,

On Sep 22, 2017, at 4:54 PM, taua...@gmail.com wrote:

Hello again, Minh!

I am trying to run a concatenated analysis on a large protein dataset doing partitioning and model test all in the same run. In addition to standard protein models, I would like to also consider some mixture models in the search, LG4M, LG4X and the CAT-like models.

1) How do I add the CAT-like models in -mrate? Do I need to list all option C10,C20 etc or can I use something like CAT or just C to indicate a search through all possibilities?

just use -madd option. For example, -madd LG4M,LG4X,LG+C10,LG+C20. I recommend to do LG+… because by default, C10/C20,… assumes equal substitution rate parameter and just varying AA frequencies. That’s not realistic. 


2) Also, if I understood correctly, to account for heterotachy, I should use -sp instead of -spp, but then can I still submit the partition file or not?

Yes of course, -sp and -spp take a partition file.

(Another related question is about what the heterotachy option is actually doing: I learned heterotachy as within-site variation through time, but in the manual it seems to be variation across taxa and sites…)

@Stephen: can you enlighten this?


Simplified example:
iqtree -st AA -m MFP+MERGE -madd LG4M,LG4X -mrate G,R -msub nuclear -s input.nex -spp partition_file

Looks file. But please note that if you add LG+C10 like explained above, it will allow each partition to evolve under this CAT model, which can be overkilled (over-parameterization). Thus, right now the CAT-like models are mainly used for non-partition analysis (without -spp).

Cheers, Minh

Stephen Crotty

unread,
Sep 26, 2017, 7:30:54 AM9/26/17
to iqt...@googlegroups.com, taua...@gmail.com
Hi Tauana,

With regard to your heterotachy question and the difference between the -sp and -spp options for partition models. When you fit a partition model with the -sp option, only one set of branch lengths is inferred. For each partition these branch lengths are then scaled by some factor. This allows some partitions to evolve at faster or slower rates than others, but it does not accommodate heterotachy - variation in rate amongst lineages. If you instead specify the -spp option, then IQ-TREE will infer a separate set of branch lengths for each partition. This means that a particular branch (lineage) could be slow-evolving in one partition (relative to the other branches in that partition)  and fast-evolving in another (again, relative to the branches in that partition).

Hope this helps,

Stephen

Stephen Crotty

unread,
Sep 26, 2017, 9:25:42 AM9/26/17
to iqt...@googlegroups.com, taua...@gmail.com
Hi Tauana,

I am sorry but I made a mistake in my below email. All of my description/explanation is correct but I mixed up the two options. The -spp option infers only one set of branch lengths and applies a scaling factor for each partition. The -sp option is the one used to account for heterotachy among partitions, inferring separate branch lengths for each partition.

Sorry for the confusion!

Stephen

taua...@gmail.com

unread,
Sep 26, 2017, 9:57:37 AM9/26/17
to IQ-TREE
Thank you, @Minh,
So I will try two things, 1) partitioned analysis with --madd as you recommended (with the risk of overfitting), and 2) without the partition file, but still adding --madd LG4M,LG4X,LG+C10,LG+C20,LG+C…  Does that make sense?

Thanks, @Stephen,
I understand your explanation. I guess the issue was that I work with a different definition of heterotachy, which is within-site rate variation through time. But the important thing is that I know what the model will be doing in this case.

Thanks!

taua...@gmail.com

unread,
Sep 26, 2017, 1:06:53 PM9/26/17
to IQ-TREE
Hi again Stephen,
I was missing a connection from my definition to how the problem is actually treated for reconstruction purposes. It makes sense now! No disagreement between definitions. Thanks!
Reply all
Reply to author
Forward
0 new messages