A trying to reconstruct a tree within a bacteria CC group (expected to be very closely related). We removed expected recombinant sites, and worked with a core alignment.
We choosen a standard Model of substitution: GTR+F+G4 (we have not done model testing yet, we are just learning ...) .
We are not so surprized to observe that the G4 might be actually overfitting, because it seems we have a very low variation between heterogeneity rate classes if I understand well, wich we woud expect if we were working within a lineage/CC group.
But we were wondering why the Rate matrix Q was not symetrical.
Is it because of rounding/model approximation or scaling? I see all diagonal values are negative. Could you explain?
Please extract of output corresponding to this question bellow
Best regards
Eve
```
SEQUENCE ALIGNMENT
------------------
Input data: 25 sequences with 1441 nucleotide sites
Number of constant sites: 0 (= 0% of all sites)
Number of invariant (constant or ambiguous constant) sites: 0 (= 0% of all sites)
Number of parsimony informative sites: 481
Number of distinct site patterns: 226
SUBSTITUTION PROCESS
--------------------
Model of substitution: GTR+F+G4
Rate parameter R:
A-C: 0.7237
A-G: 5.0504
A-T: 1.3350
C-G: 0.2022
C-T: 5.0167
G-T: 1.0000
State frequencies: (empirical counts from alignment)
pi(A) = 0.2086
pi(C) = 0.2908
pi(G) = 0.2811
pi(T) = 0.2195
Rate matrix Q:
A -1.203 0.1316 0.8877 0.1832
C 0.0944 -0.8185 0.03554 0.6886
G 0.6588 0.03676 -0.8328 0.1373
T 0.1741 0.9121 0.1758 -1.262
Model of rate heterogeneity: Gamma with 4 categories
Gamma shape alpha: 998.4-
Category Relative_rate Proportion
1 0.9601 0.25
2 0.9894 0.25
3 1.01 0.25
4 1.041 0.25
Relative rates are computed as MEAN of the portion of the Gamma distribution falling in the category.
```