Comparing best scores between runs

201 views
Skip to first unread message

Jacob Steenwyk

unread,
Mar 17, 2018, 10:51:28 AM3/17/18
to IQ-TREE
Hi,

In RAxML loglikelihood values are not comparable between runs. Are log likelihood / best score values comparable between runs using IQ-tree?


Heiko Schmidt

unread,
Mar 17, 2018, 5:19:45 PM3/17/18
to IQ-TREE Forum
Dear Jacob,

If the underlying dataset and evolutionary model are identical, the log-likelihoods between trees from runs of the same program should be comparable.

Do you know any reason why this should not be the case in RAxML?

I seem to remember that the RAxML manual mentions that you have to re-estimale log-likelihoods from different programs with the same program to obtain comparable values.

Best regards,
Heiko


> On 17 Mar 2018, at 15:51, Jacob Steenwyk <jlste...@gmail.com> wrote:
>
> Hi,
>
> In RAxML loglikelihood values are not comparable between runs. Are log likelihood / best score values comparable between runs using IQ-tree?
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "IQ-TREE" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.
> To post to this group, send email to iqt...@googlegroups.com.
> Visit this group at https://groups.google.com/group/iqtree.
> For more options, visit https://groups.google.com/d/optout.


Brian Foley

unread,
Mar 18, 2018, 7:49:59 PM3/18/18
to IQ-TREE
I have not run RAxML on this data set yet, but when I run PHYML and IQ-tree on
the data I get essentially identical scores for both, using the same data and similar
program settings (output below).

An neighboring region of the same genomes gives quite different results.  The tree
topology is quite similar but many factors such as the relative rate of C-T vs A-G
mutations observed is more than 4-fold different between the two regions.  The
two regions of the genome I tested are only a few hundred bases apart, each region
is about 800 bases long.  AT:GC content is quite similar between the two regions
the entire genomes of these viruses are consistently A-rich and C-poor.

I would suspect that the huge difference in the ratio of G-A:C-T mutations observed
in the two neighboring regions is mostly or all due to selection pressure differences
and not due to underlying base substitution rate differences. 

Anyway, I cannot think of any reason why a single program (RAxML) used the same
way on the same data would not give the same results (or very very close) on each run.
How different are your results from run to run?


=========

IQ-TREE: A fast and effective stochastic algorithm for estimating
maximum likelihood phylogenies. Mol. Biol. Evol., 32:268-274.
http://dx.doi.org/10.1093/molbev/msu300

SEQUENCE ALIGNMENT
------------------

Input data: 63 sequences with 694 nucleotide sites
Number of constant sites: 364 (= 52.4496% of all sites)
Number of invariant (constant or ambiguous constant) sites: 364 (= 52.4496% of all sites)
Number of distinct site patterns: 333

SUBSTITUTION PROCESS
--------------------

Model of substitution: GTR+F+R8

Rate parameter R:

  A-C: 1.9776
  A-G: 3.2692
  A-T: 0.6819
  C-G: 0.4257
  C-T: 7.3702
  G-T: 1.0000

State frequencies: (empirical counts from alignment)

  pi(A) = 0.3874
  pi(C) = 0.1492
  pi(G) = 0.2432
  pi(T) = 0.2202

Rate matrix Q:

  A   -0.7834    0.1864    0.5022   0.09483
  C    0.4838    -1.574   0.06539     1.025
  G    0.7998   0.04012    -0.979    0.1391
  T    0.1668    0.6946    0.1536    -1.015

Model of rate heterogeneity: FreeRate with 8 categories
Site proportion and rates:  (0.2415,0.0001269) (0.1498,0.002529) (0.1293,0.2965) (0.109,0.3132) (0.05505,0.542) (0.1842,1.697) (0.09313,3.416) (0.03803,7.006)

 Category  Relative_rate  Proportion
  1         0.0001269      0.2415
  2         0.002529       0.1498
  3         0.2965         0.1293
  4         0.3132         0.109
  5         0.542          0.05505
  6         1.697          0.1842
  7         3.416          0.09313
  8         7.006          0.03803

MAXIMUM LIKELIHOOD TREE
-----------------------

Log-likelihood of the tree: -8081.6459 (s.e. 367.3025)
Unconstrained log-likelihood (without tree): -2921.4689
Number of free parameters (#branches + #model parameters): 145
Akaike information criterion (AIC) score: 16453.2918
Corrected Akaike information criterion (AICc) score: 16530.5545
Bayesian information criterion (BIC) score: 17111.9502

Total tree length (sum of branch lengths): 2.4551
Sum of internal branch lengths: 0.6526 (26.5832% of tree length)

=======
=======
. Sequence filename: 			A1-A6_PlusRefs-4791-5478region.FASTA
. Data set: 				#1
. Tree topology search : 		NNIs
. Initial tree: 			BioNJ
. Model of nucleotides substitution: 	GTR
. Number of taxa: 			63
. Log-likelihood: 			-8091.88275
. Unconstrained likelihood: 		-2921.46891
. Parsimony: 				1447
. Tree size: 				2.60881
. Discrete gamma model: 		Yes
  - Number of categories: 		8
  - Gamma shape parameter: 		0.688
. Proportion of invariant: 		0.345
. Nucleotides frequencies:
  - f(A)= 0.38735
  - f(C)= 0.14923
  - f(G)= 0.24324
  - f(T)= 0.22018
. GTR relative rate parameters : 
  A <-> C    2.01652
  A <-> G    3.23357
  A <-> T    0.68237
  C <-> G    0.42950
  C <-> T    7.59203
  G <-> T    1.00000

. Instantaneous rate matrix : 
  [A---------C---------G---------T------]
  -0.77548   0.18854   0.49280   0.09414  
   0.48940  -1.60221   0.06546   1.04736  
   0.78476   0.04016  -0.96288   0.13795  
   0.16561   0.70984   0.15240  -1.02785  



. Run ID:				none
. Random seed:				1521408702
. Subtree patterns aliasing:		no
. Version:				20120412
. Time used:				0h2m29s (149 seconds)

=======

Brian Foley

unread,
Mar 19, 2018, 7:37:01 PM3/19/18
to IQ-TREE
I used RAxML on the CIPRES server today, and it gives me essentially identical results as I got from
PhyML and IQ-tree. 


=========
Final ML Optimization Likelihood: -8094.993189
Thorough ML search on Process 5: Time 3.737479 seconds

processID = 5, bestLH = -8095.983564

Model Information:

Model Parameters of Partition 0, Name: No Name Provided, Type of Data: DNA
alpha: 0.871991
invar: 0.411325
Tree-Length: 2.750827
rate A <-> C: 2.000321
rate A <-> G: 3.213965
rate A <-> T: 0.683432
rate C <-> G: 0.425764
rate C <-> T: 7.604281
rate G <-> T: 1.000000

freq pi(A): 0.387350
freq pi(C): 0.149229
freq pi(G): 0.243239
freq pi(T): 0.220183


ML search took 15.577952 secs or 0.004327 hours

Combined Bootstrap and ML search took 55.508033 secs or 0.015419 hours
===========
https://www.phylo.org/portal2/home.action
 

If you use the resources available from the CIPRES Science Gateway to complete published work, please cite us as follows: Miller, M.A., Pfeiffer, W., and Schwartz, T. (2010) "Creating the CIPRES Science Gateway for inference of large phylogenetic trees" in Proceedings of the Gateway Computing Environments Workshop (GCE), 14 Nov. 2010, New Orleans, LA pp 1 - 8.





Jacob Steenwyk

unread,
Mar 19, 2018, 8:08:37 PM3/19/18
to iqt...@googlegroups.com
Hi, 

Thank you very much for all of your replies. 

Regarding the difference between RAxML runs, I think this largely has to do with setting the seed and I think the same goes for IQ-tree with only small negligible differences much like RAxML. Again, thank you for all your answers!

best,

Jacob

--
You received this message because you are subscribed to a topic in the Google Groups "IQ-TREE" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/iqtree/pNvokBzfioo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to iqtree+unsubscribe@googlegroups.com.

Minh Bui

unread,
Mar 20, 2018, 8:04:29 AM3/20/18
to IQ-TREE, Jacob Steenwyk
Ah I see what you mean now. IQ-TREE can also produce a different tree with different log-likelihood from different runs due to stochasticity. However, the log-likelihoods are still comparable. That was what confusing in the first place. Hope that things are clear now.

Cheers
Minh

You received this message because you are subscribed to the Google Groups "IQ-TREE" group.

To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+un...@googlegroups.com.

Jacob Steenwyk

unread,
Mar 20, 2018, 9:29:56 AM3/20/18
to Minh Bui, IQ-TREE
Sorry about the confusion and thank you again for everyone's helpful and prompt responses!

Minh

To unsubscribe from this group and stop receiving emails from it, send an email to iqtree+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages