Problems with quarnetGoFtest

Kevin I. Sánchez

unread,

Dec 21, 2022, 1:51:08 PM12/21/22

to PhyloNetworks users

Dear Claudia and Cecile,

I was trying to perform a goodness of fit test following the tutorial of the quarnetGoFtest! function with my own estimated networks but I keep getting stuck with the following error:

Warning: The simulated z values are far from 0 and they shouldn't:
│ with a mean of 0.379 and a standard deviation of 1.9784.
│ The network might be in a form that causes a bug in the hybrid-Lambda simulator

I have tried the following:

Updating all the packages to the latest versions
Running the test with networks without node and bootstrap support values
Renaming the nodes with letters, e.g. I1, I2...

And the commands to reproduce the error:

SNPs2CF_DF = DataFrame(CSV.File("CF_test.csv"))

net1 = readTopology("(F,#H1,(((C,B),((E,D),(H,(A)#H1))),G));")

res1 = quarnetGoFtest!(net1, SNPs2CF_DF, true, seed = 35354, nsim = 1000);

Curiously, when nsim = 200 the analysis runs perfectly, but after nsim = 300 I found the error message

Below I attach the concordance table generated with the SNPs2CF() function in R

Thanks in advance for any help/comment

CF_test.csv

Cécile Ané

unread,

Dec 22, 2022, 1:01:03 PM12/22/22

to PhyloNetworks users

Hi Kevin,

If you keep getting this error, it means that you are not using the latest version of QuartetNetworkGoodnessFit (aka QGoF for quarnet goodness-of-fit!).

In version 0.4.0, the error is different (see here) --and should be mostly eliminated in fact, because v0.4.0 does not rely on Hybrid-Lambda, precisely because Hybrid-Lambda was not quite reliable.

The error message you got would be in version 0.3.4 (see here) or earlier.

If you did update all packages, it means that one of the packages you require is holding back other packages, for compatibility. So you would need to find out which package is holding back some other packages. It could be the version of Julia itself.

To find out, go in package mode (type ']') and type 'status' to get the list of the packages you require, and which version you are using. What do you see?

You would get information on all installed packages (those required and their dependencies) with 'status --manifest'.

With the latest version of julia, the output shows which of these packages have an update available, but not installed yet.

There is also a way to see which packages is holding back another, but I don't remember exactly.

When I run across this issue, I usually remove the package that I don't need, and re-run 'update' to see if that works to update the packages I really want to use.

To remove a package named 'ThisPackage' for example, type 'rm ThisPackage'.

You could also force the installation of QGoF v0.4.0 by typing 'add QuartetNetwo...@0.4.0'

This command may fail, but it may tell you why: that is, it would tell you which package is preventing the update to v0.4.0. (at least with the latest version of Julia).

Good luck!

Cecile.

Kevin I. Sánchez

unread,

Dec 22, 2022, 3:04:49 PM12/22/22

to PhyloNetworks users

Dear Cecile,

Thanks for your response
With your help, I was able to update the QgoF package, and the error still shows, even if nsim is set at 1 million.
I believe that the problem is related to the low number of genes (SNPs) explaining each quartet, between 10 and 38.

When I run the analysis with a CF table in which I multiply the number of genes in each row by 10, the problem disappears. I believe this is not a valid approach to overcoming the problem, but interesting for testing purposes.
Why did the low number of genes/snps cause this error when trying to perform the simulations?

kevin

Cécile Ané

unread,

Dec 22, 2022, 3:24:48 PM12/22/22

to PhyloNetworks users

Oh yes you are right, great diagnostic Kevin. With few genes, each outlier test (one for each four-taxon set) would rely on a poor approximation. This outlier test uses the likelihood ratio statistics (by default) and compares it to a chi-square distribution. While this gives an accurate p-value with many genes, the approximation could be too coarse with few genes. The simulation strategy offers an alternative though, which should be explained in the error message. With the updated version of the package, the error message should suggest to get an empirical p-value, and should provide example code for it. This empirical p-value would be valid. The disadvantage is that it would require many more simulations to estimate the overall p-value accurately in case this p-value is small.

Kevin I. Sánchez

unread,

Dec 23, 2022, 7:59:16 AM12/23/22

to PhyloNetworks users

Ok,
Effectively, the message error includes a step by step to calculate an empirical p-value

To make sure that I'm following. If I still get the message error even with a very high number of simulations, the empirical p-value would be valid?
(I tried between 1000 and 10million simulations, with the error message still showing)

kevin

Cécile Ané

unread,

Dec 23, 2022, 2:39:36 PM12/23/22

to PhyloNetworks users

Yes it would be valid: because the simulation replicates exactly what's done on the original data, and the empirical p-value does not assume that the z-value has a particular distribution (normal, centered at 0) under the null hypothesis. We can "fix" the warning by avoiding the p-value obtained under the assumption that z is normal centered at 0 (under the null), and by using the empirical p-value instead. I'm sorry that the code doesn't get this empirical p-value automatically. Perhaps in the next version!

Kevin I. Sánchez

unread,

Dec 31, 2022, 12:12:35 PM12/31/22

to PhyloNetworks users

Dear Cécile

Thanks for the help!

Best

Reply all

Reply to author

Forward