SweeD: different grid sizes give different results

135 views

Skip to first unread message

Gab Diesel

unread,

Sep 21, 2020, 12:07:08 PM9/21/20

to OmegaPlus

Hi,

I am working with a nonmodel organism and I am trying to figure out what would be appropriate window size (grid points number) to run SweeD.

Personally, I would use 2 criteria to determine the correct grid size:

1. Linkage disequilibrium decay in the genome under analysis

2. Have a grid that make windows with enough SNPs and not have many windows with very few SNPs.

Reading this group and online it seems that people run SweeD with a number of grid points which gives them more or less 5000-10,000 bp windows across their genome, some even go smaller.

In my organism LD decays really quickly, so I think using a grid number that gives me roughly windows of 5 kb is a good option. I would not go lower because I mite end up with many windows with few SNPs.

However I have run my analysis and did 3 separate SweeD runs with different grid sizes to have my genome partitioned in 5 kb windows, 10 kb and 20kb.

I extracted the outliers (0.99) from each analysis without doing simulation (qust extracting extreme values) and I was comparing the 3 different runs and the overlap is very small, which surprises me and confuses me.

For example, if I run on 5kb windows I get around 1400 outliers windows at 0.99 quantile threshold. However only 46 of these 1400 windows are however included in the 0.99 outliers of 10kb SweeD run which had 700 outliers windows of 10k above the 0.99 treshold.

So essentially, using a different grid size gives significantly different results. If for example I would look at the genes intersecting the outlier windows I have 400 genes intersecting the outlier windows at 5kb scan. 300 genes intersecting outlier windows at 10kb scan. The overlap between these two is only 40 genes.

I am not sure which run to consider, whether if to find an overlap between the runs on different grid sizes. However basically, each grid size is giving me very different results.

Maybe it is me making some mistake, I am not sure. To tranform the SweeD output to windows I simply take the outlier position and start is position-windowsize and end is position+ window size.

I hope it is clear enough.

Thank you.

N.Alachiotis

unread,

Oct 12, 2020, 1:07:48 PM10/12/20

to OmegaPlus

Changing the grid size should not be affecting the results. It's mostly a matter of how you process the generated report afterwards.

The existence of a grid size is to prevent very long processing times. Its just providing a tradeoff between runtime and how exhaustively the dataset is scanned.

If you use a very large grid size, you will see many neighboring locations with the same score. These are typically treated as a region because the exact same SNPs were used for calculating the score. Essentially, the exact same computation was repeated multiple times for neighboring positions. SweeD does not implement a fixed-window algorithm.

Reply all

Reply to author

Forward

0 new messages