Explicitly reporting observed amino acid frequencies of the alignment

27 views
Skip to first unread message

Joran Martijn

unread,
Oct 26, 2017, 6:32:14 AM10/26/17
to IQ-TREE

Hi!

I'm currently trying to run some parametric simulations using SiteSpecific.seq-gen (http://www.mathstat.dal.ca/~hcwang/Procov/SiteSpecific.seq-gen/). Ideally, I would like to run simulations with the +F version of the model, meaning I need to extract the observed amino acid frequencies from the alignment, and feed them into the simulation software.

It would therefore be great if IQTREE would report the observed frequencies it uses for the +F somewhere (maybe in the .log, or a separate output file?).

I could write a little script myself, but since I want to be absolutely sure I'm using the same +F frequencies as IQTREE, I feel this would be a bit better.

My apologies if IQTREE already does this, but I wasn't able to find them.

Cheers,

Joran

Bui Quang Minh

unread,
Oct 26, 2017, 11:46:33 AM10/26/17
to iqt...@googlegroups.com, Huaichun Wang
Hi Joran,

When you run the protein analysis with -m LG+F for example, then the amino-acid frequencies will be printed into .iqtree file like this:

SUBSTITUTION PROCESS
--------------------

Model of substitution: LG+F

State frequencies: (empirical counts from alignment)

  pi(A) = 0.0790
  pi(R) = 0.0395
  pi(N) = 0.0358
  pi(D) = 0.0544
  pi(C) = 0.0193
  pi(Q) = 0.0487
  pi(E) = 0.0513
  pi(G) = 0.0881
  pi(H) = 0.0245
  pi(I) = 0.0509
  pi(L) = 0.0932
  pi(K) = 0.0741
  pi(M) = 0.0237
  pi(F) = 0.0433
  pi(P) = 0.0423
  pi(S) = 0.0604
  pi(T) = 0.0561
  pi(W) = 0.0136
  pi(Y) = 0.0372
  pi(V) = 0.0646

Which you can use. Moreover, I CC my collaborator, Huaichun here, in case he has further tip.

Cheers, Minh

-- 
You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
To post to this group, send email to iqt...@googlegroups.com.
Visit this group at https://groups.google.com/group/iqtree.
For more options, visit https://groups.google.com/d/optout.

--
Bui Quang Minh
Center for Integrative Bioinformatics Vienna (CIBIV)
Campus Vienna Biocenter 5, VBC5, Ebene 1
A-1030 Vienna, Austria
Phone: ++43 1 4277 74326
Email: minh.bui (AT) univie.ac.at

Joran Martijn

unread,
Nov 7, 2017, 6:15:46 AM11/7/17
to IQ-TREE
Hi Minh,

I can't find this information in the .iqtree file. Under substitution process, it only reports the mixture weights for the 60 components and the F component, and the gamma relative rate parameters. Maybe it doesnt report the observed frequencies when you invoke a mixture model?

For this analysis, I'm using version 1.5.0, and invoked the model LG+C60+F+G.

For now I'll just run a dummy LG+F analysis to get the frequencies, but I suppose it would be nice to have them reported when you run LG+C60+F+G or similar models as well.

Joran

Heiko Schmidt

unread,
Nov 7, 2017, 9:59:49 AM11/7/17
to iqt...@googlegroups.com
Dear Joran,

I am not completely sure what you need. If you need the observed column-wise AA frequencies (a wild guess because you use SiteSpecific.seq-gen), I think this is not output by IQ-Tree.

If I remember correctly, I had implemented that output into TREE-PUZZLE sometime in the past.
If that is what you need… I can dig up the command line options required to get this. However, TREE-PUZZLE requires standard PHYLIP/Newick input for the alignment, that means the whole alignment has to be in one file and sequence names are only allowed up to 10 letters. For the latter restriction (10 letter) there is a pretty easy way to ensure this, because just for extracting the site AA frequencies, you do not need to keep the original names. (I can provide more information if required.)

Best,
Heiko
-----------------------------------------------------------------------------
Heiko Schmidt
Center for Integrative Bioinformatics Vienna (CIBIV)
University of Vienna / Max F. Perutz Laboratories (MFPL)
Campus Vienna Biocenter 5 (VBC5)
A-1030 Vienna, Austria
http://www.cibiv.at/
-----------------------------------------------------------------------------

Joran Martijn

unread,
Nov 8, 2017, 4:19:58 AM11/8/17
to IQ-TREE
Hej Heiko,

IQ-TREE does output the overall observed amino acid frequencies, which is what I need. It just doesn't seem to output it when I specify a mixture model. It did output them when using LG+F , like Minh said.
And yes, the frequencies are meant for SiteSpecific.seq-gen. Huaichun updated the tool recently so you can input your own amino acid frequencies.

So, for me the problem is solved. I just run IQ-TREE with LG+F to get the observed frequencies of a dataset, and then use those as input for SiteSpecific.seq-gen. I was just pointing out that it was a bit strange that if you specify a mixture model in addition to +F, IQTREE does not output the observed overall amino acid frequencies.

Cheers,

Joran

Bui Quang Minh

unread,
Nov 8, 2017, 4:42:59 AM11/8/17
to iqt...@googlegroups.com, Joran Martijn
Hi Joran again, I put this into the TODO list…

M
Reply all
Reply to author
Forward
0 new messages