Re: HWE and linkage thersholds

121 views
Skip to first unread message
Message has been deleted

Christopher Chang

unread,
Mar 29, 2019, 1:12:03 PM3/29/19
to plink2-users
1a. There isn't much of a practical difference between --indep and --indep-pairwise for your dataset size.  The main thing --indep does that --indep-pairwise doesn't is detect groups of 3 or more SNPs where e.g. one SNP is approximately the "sum" of the other two; up to you whether you want to remove a SNP in that situation.
(For very large datasets, --indep-pairwise is normally used because it's so much easier to compute.)
1b. Yes, a size-50 sliding window still works when the scaffold is smaller than that.  I'd just use size 450; the main reason to choose a smaller value is speed, which just isn't an issue here.
2. plink only performs Bonferroni correction when you explicitly request it with --adjust (which doesn't apply to HWE).
As for what HWE threshold is reasonable for QC purposes, this depends on how many samples you have.  I generally prefer 1e-10 or smaller for >1000 samples, but if you have fewer samples, you may need to choose a less stringent threshold.  Also note that HWE violations are *expected*, in the too-few-hets direction, when population stratification is relevant; plink 2.0's 'keep-fewhet' --hwe modifier provides a workaround for this.

On Friday, March 29, 2019 at 9:45:54 AM UTC-7, PR wrote:
Hi All,

I have a set of 1200 SNPs. I need to remove those that are linked, to create a final SNP set, and then test the populations the SNPS for HWE with Bonferroni correction

Linkage
I can see t ways in plink to run this. 

1. Indep
2. Pairwise

Which would be the better option. It seems that both of these options adopt this 50SNP per chromosome stat. However, my SNPs were aligned to scaffolds, with SNPs on each scaffold ranging from 1 to 450, would the use of these 50 SNP sliding windows still work? Also if setting r2 and VIF, which values are best to ensure you do a reasonable job of removing linked SNPS, without removing potentially useful SNPS

HWE
When setting to analyse HWE, plink gives the example of  plink --file mydata --hwe 0.001

Could someone advise what the 0.001 number is, and how best to set this. Is HWE calculated with Bonferroni correction?

Thanks for all the help


Message has been deleted

Christopher Chang

unread,
Mar 29, 2019, 2:56:57 PM3/29/19
to plink2-users
It's entirely possible for two lower-MAF adjacent SNPs to not be strongly correlated in your dataset.  Take a look at the actual genotypes at those SNPs.

On Friday, March 29, 2019 at 10:25:52 AM UTC-7, PR wrote:
Thanks for the reply, I'll read through all of this, and again thanks for the reply.

One thing I forgot to ask that I've just noticed, is no matter how stringent I set the r2 value, some of the snps on the pruned in list, are only within a couple of hundred bases of each other. This presence of such physically close SNPs  is understandable as they may represent where we used read 1 and read 2 as separate reads in our stacks analysis, with plan to treat them for linkage later. What I don't quite get, is why for 2 SNPS 200bp apart, and thus clearly linked, would both get into the pruned in list. This seems to happens  lot in the prune.in list.
Message has been deleted

Christopher Chang

unread,
Mar 29, 2019, 4:15:13 PM3/29/19
to plink2-users
Try --bp-space.

On Friday, March 29, 2019 at 12:42:35 PM UTC-7, PR wrote:
Yes you are right, and thank you Chris. I guess as a workaround, is there a route in Plink, to remove a SNP, if it is within say 200bases of another SNP, (ie keep 1 and reject the second) this would get around my inclusion of having 2 SNPS that were effectively: 1 in read 1; and 1 in read 2, from the same locus at the point of original sequencing. 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages