increasing from h=1 to h=2 does not add a second hybrid node?

57 views
Skip to first unread message

Andrea Berardi

unread,
Apr 23, 2024, 4:26:27 PMApr 23
to PhyloNetworks users
Hello all,
I apologize if this is a completely naive question, but I am running SNaQ on a dataset with six taxa (in multi-individual mode with a mapping file), and 40 genes. When I increase from h=1 to h=2, the hybrid node position changes but I do not see an additional hybrid node (was expecting 2). The difference between the loglik of net1 and net2 is minimal, but why might I be seeing no additional hybrid nodes in net2? Is it because I have too few taxa, or am I perhaps running this incorrectly? Or, is it that the tree topology changes between net1 and net2? I tried to look at the log and network files but I'm not sure I see anything in there to help me understand.

net0 = snaq!(astraltree,raxmlCF, hmax=0, filename="net0", seed=1234)
net1 = snaq!(net0, raxmlCF, hmax=1, filename="net1", seed=2345)
net2 = snaq!(net1,raxmlCF, hmax=2, filename="net2", seed=6456)

scores = [net0.loglik, net1.loglik, net2.loglik]
[4.482532584339262, 1.759535880241021, 1.7093122725040242]
Here are net1 (top) and net2 (bottom) plotted after re-rooting, and showing the major trees and default tree plot.
net1_vs_net2.png

Attached are the logs and networks if that's helpful - I'm running phylonetworks in a jupyter notebook which unhelpfully truncates the species names (each "S_carol..." is a different subspecies).

Thanks for any insights you might have,
Andrea
net0.out
net0.log
net2.log
net2.networks
net1.log
net1.networks

Cécile Ané

unread,
Apr 28, 2024, 12:26:57 PMApr 28
to PhyloNetworks users
Hi Andrea,

1. about the output with hmax=2 having only 1 reticulation: most likely, this is because there was no network with 2 reticulations that don't "overlap" each other and a better score. SNaQ assumes that the network is of "level 1", that is, each reticulation creates a cycle that does not share any node with the cycle of another reticulation. This assumption restricts the placement of 2 reticulations, when 2 are allowed. So, sometimes, a network with 1 well-placed reticulation could have a better score than networks with 2 reticulations constrained to be "far" from (not overlap with) one another. This might be the case for your data.

2. The SNaQ search with a maximum allowed hmax=2 seems to have found a network with h=1 and a slightly better score (1.709) than the first search that allowed for a maximum h=1 only (score: 1.759). The difference between these 2 scores is very small. These 2 networks (both with h=1) display quite different historical scenarios, but both have a very low score, which is good (0 means perfect fit). So I would not be surprised if both networks are within a "high-confidence set", that is, considered to fit well based on a goodness-of-fit test. A bootstrap analysis might also find uncertainty in the estimated network.

I hope this helps a bit!
Cecile.

On Tuesday, April 23, 2024 at 3:26:27 PM UTC-5 Andrea Berardi wrote:
Hello all,
I apologize if this is a completely naive question, but I am running SNaQ on a dataset with six taxa (in multi-individual mode with a mapping file), and 40 genes. When I increase from h=1 to h=2, the hybrid node position changes but I do not see an additional hybrid node (was expecting 2). The difference between the loglik of net1 and net2 is minimal, but why might I be seeing no additional hybrid nodes in net2? Is it because I have too few taxa, or am I perhaps running this incorrectly? Or, is it that the tree topology changes between net1 and net2? I tried to look at the log and network files but I'm not sure I see anything in there to help me understand.

net0 = snaq!(astraltree,raxmlCF, hmax=0, filename="net0", seed=1234)
net1 = snaq!(net0, raxmlCF, hmax=1, filename="net1", seed=2345)
net2 = snaq!(net1,raxmlCF, hmax=2, filename="net2", seed=6456)

scores = [net0.loglik, net1.loglik, net2.loglik]
[4.482532584339262, 1.759535880241021, 1.7093122725040242]
Here are net1 (top) and net2 (bottom) plotted after re-rooting, and showing the major trees and default tree plot.

Andrea Berardi

unread,
Apr 29, 2024, 4:59:53 PMApr 29
to PhyloNetworks users
Hi Cecile,
Thanks so much for your reply - that does help.
Is there any way to run the bootstrap analysis for datasets with multiple alleles per species?
Thanks very much!
Andrea

Cécile Ané

unread,
Apr 29, 2024, 5:18:18 PMApr 29
to PhyloNetworks users
I am not aware of anyone who might have taken the time to code this feature.
Solutions from this thread might be useful.
If your input is a list of bootstrap gene trees (as opposed to a table of concordance factors with credibility intervals), you could repeat this many times (e.g. 100 replicates):
- sample bootstrap trees, by sampling one tree per gene
- combine them into an array, or list of bootstrap gene trees (1 per gene)
- create a table of quartet concordance factors from this list of gene trees
- run snaq! with this table as input, with distinct file names for distinct replicates (e.g. include the replicate number in the file name).

When all replicates are done:
- read the best networks from each replicate
- combine all bootstrap networks into an array (100 of them if you did 100 replicates)
- summarize as explained in the documentation.

It would be nice to have example code to do this... if someone has the time to contribute!
Cecile.

Andrea Berardi

unread,
Apr 30, 2024, 10:55:24 AMApr 30
to PhyloNetworks users
Thank you again, Cecile! I can give this a try.
Reply all
Reply to author
Forward
0 new messages