SweeD does not report correct CLR for folded spectra

150 views
Skip to first unread message

Alexander Klassmann

unread,
Aug 21, 2020, 10:29:55 AM8/21/20
to OmegaPlus
Hello,
SweeD does not output a correct CLR (and probably alpha) for folded spectra as in the following example of a simulated strong sweep:

Output SweeD:
24954877.3877    0.000000e+00    2.660754e-02    24953940.0000    24955327.0000
24964879.3125    0.000000e+00    1.318682e-01    24964668.0000    24964971.0000
24974881.2373    0.000000e+00    1.008404e-01    24974129.0000    24974999.0000
24984883.1621    0.000000e+00    1.188119e-01    24984783.0000    24985169.0000
24994885.0869    0.000000e+00    6.417114e-02    24994602.0000    24995071.0000
25004887.0117    0.000000e+00    2.400000e+00    25004883.0000    25005130.0000
25014888.9365    0.000000e+00    1.008404e-01    25014769.0000    25015216.0000
25024890.8613    0.000000e+00    3.000000e+00    25024888.0000    25025193.0000
25034892.7861    0.000000e+00    8.695655e-02    25034568.0000    25035030.0000
25044894.7109    0.000000e+00    1.176471e-01    25044794.0000    25045055.0000

Output Sweepfinder:
24954877.725339    1.182454    3.270155e-04
24964879.650005    0.093961    7.456625e-03
24974881.574671    0.829999    8.739903e-04
24984883.499337    46.348489    3.772824e-05
24994885.424003    57.058797    2.168375e-05
25004887.348669    0.259692    2.800138e-02
25014889.273335    7.959813    2.066296e-04
25024891.198001    0.343199    3.594658e-02
25034893.122667    0.624442    8.493904e-04
25044895.047334    0.187886    9.855205e-03

The input file was in the Sweepfinder format, but 'ms' format yields the same problem. I tried version 3.2.1 and 4.0.0 of SweeD.

If the same input data is characterized as unfolded, (a zero instead of a 1 in the last column) both tools yield very similar results. Applying command line option "-folded" instead of marking the status in the file makes no difference.

I guuess this is a bug and not a feature...

Alex

pavlos

unread,
Sep 14, 2020, 5:04:30 AM9/14/20
to OmegaPlus
Dear Alexander,
sorry for the delay.

I have tested a lot SweeD and SweepFinder in the past. I think that the 0 values are actually correct. It actually means that the data do not support at all the sweep model but rather the neutral model. SweepFinder, I think, due to some numerical instabilityes report positive values.
In any case, I could test the data, please send some input file

best
pavlos

Alexander Klassmann

unread,
Sep 15, 2020, 6:24:52 AM9/15/20
to omeg...@googlegroups.com
Dear Pavlos,
Probably a plot is more convincing than a file excerpt. I'm sending you one (from a different simulation than in the previous message) comparing the results of the two programs. Does the SweepFinder folded data look random or that of SweeD?
Could you provide a data set to show that SweeD actually provides correct data on folded spectra? This link from your home page is protected and I cannot open it: https://cme.h-its.org/exelixis/resource/download/data/SweeD_DATA_SCRIPTS.tar.bz2
Best, Alex

--
You received this message because you are subscribed to a topic in the Google Groups "OmegaPlus" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/omegaplus/17Zc80PFz9g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to omegaplus+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/omegaplus/f929a423-11f8-494f-ad0f-5b1569f6df08n%40googlegroups.com.
Comparison_SweepFinder_SweeD.pdf

pavlos

unread,
Sep 18, 2020, 6:17:45 AM9/18/20
to OmegaPlus
Dear Alexander,
thanks for the info. It seems that indeed it's a bug. I'll come back to you ASAP. Please send me the input to test the code myself

Pavlos

Alexander Klassmann

unread,
Sep 18, 2020, 4:08:00 PM9/18/20
to omeg...@googlegroups.com
Dear Pavlos,
here is the file which I used for the scans depicted in the figure. It's in SweepFinder table format marked as folded (the last column is always '1'). I hope it helps!
Best, Alex

You received this message because you are subscribed to the Google Groups "OmegaPlus" group.
To unsubscribe from this group and stop receiving emails from it, send an email to omegaplus+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/omegaplus/182c7828-843b-4de0-b982-ad5087f305c0n%40googlegroups.com.
simulated_sweep_sweepfinder_format.txt.gz

Jordan Bemmels

unread,
Oct 20, 2020, 4:37:31 PM10/20/20
to OmegaPlus
Hi Alex and Pavlos,

Did you reach any resolution to this issue? In the meantime, I tested a solution that appears to be working for me. I was able to recreate the error that Alex noted using the files Alex provided, but then I was able to "fix" the error to give results that look more like those expected by providing SweeD with an empiricalSFS (-isfs) that I manually folded myself.

First I created the empirical SFS for the dataset (empirical_SFS.txt). 

Next, to recreate the error, I ran the following:

./SweeD -name test02_basicRun -input simulated_sweep_sweepfinder_format.SF -grid 1000 -folded -isfs empirical_SFS.txt

Indeed, most of the likelihood values are equal to zero, as Alex noted. 

Then, I tried folding the empirical SFS myself (empirical_SFS_folded.txt) so that all frequencies where the derived allele is >50% were manually set to zero, and ran the following:

./SweeD -name test03_foldedSFS -input simulated_sweep_sweepfinder_format.SF -grid 1000 -folded -isfs empirical_SFS_folded.txt

It seems like this solved the problem, because now the likelihood values have a pattern resembling those that Alex found with SweepFinder, with evidence of a sweep near 500,000 bp.

Here are plots of the two tests for comparison, and the empirical SFSs that I used:

https://drive.google.com/file/d/17ZJlhBQWPkiR4g1i8cZsSXGDVLYseWZb/view?usp=sharing

Is this the correct way to use a folded SFS? If every SNP is folded, then we could provide a folded empirical -isfs and -folded flag, then it seems to work fine? However, I suppose this solution would not work if you want to specify that some individual SNPs are folded and others are not folded (specified inside the .SF file with the “folded” column).

Thanks,
Jordan

P.S. I used v. 3.1. I know it's an old version, but I could not get a new version to install on my system.


Jordan Bemmels

unread,
Oct 21, 2020, 5:11:05 PM10/21/20
to OmegaPlus
Hi again,

An update: I tested out the tentative solution I mentioned on several empirical datasets now with all folded SNPs, and it did not work. Like Alex, I find almost all the sites as having Likelihood of zero. In one of the empirical datasets, it was an extreme case and every single site in the entire genome had a Likelihood of zero. I got similar results whether I provided an SFS with the -isfs option, or I did not include the -isfs flag at all.

I don't understand why the solution seemed to behave better when I tried it on Alex's test data, but not on an empirical dataset. Anyway, it seems my idea is not reliably working.

Let me know if you've discovered any solution or updates since your posts earlier this fall. Thanks!!

Jordan

Gabriele Nocchi

unread,
Sep 7, 2022, 10:59:11 AM9/7/22
to OmegaPlus
Dear Pavlos,

Has this bug been fixed?

Gabriele

Reply all
Reply to author
Forward
0 new messages