LD Based Pruning

1,915 views
Skip to first unread message

Jessica May

unread,
Nov 27, 2014, 11:21:04 PM11/27/14
to plink2...@googlegroups.com
Hi All,
I am using following command to do LD based pruning.

 plink --bed plink.bed --fam plink.fam --bim plink.bim  --indep-pairwise  2 2 .9

Can any body please help me to understand
What is a Window Size: [window size]<kb>
What is a Step Sige : [step size (variant ct)]

Christopher Chang

unread,
Nov 28, 2014, 10:44:29 AM11/28/14
to plink2...@googlegroups.com
Suppose plink.bed/.bim/.fam contains 3000000 variants, all on one chromosome.  If you run

"plink --bfile plink --indep-pairwise 20000 2000 .9"

then plink will first look at variants #1-20000, and prune them until no pair has r^2 > 0.9.  Then it will do the same for variants #2001-22000, then #4001-24000, and so on until #2980001-3000000.

The "window size" is the number of variants plink considers at a time.  Bigger is better, but also slower.  For the 1000 Genomes phase 1 variant set (~40 million total), I think 20000 is a reasonable window size if you have some time to spare; with lower-density data, or a tight time constraint, you can scale down the window size accordingly.

After it finishes one window, plink moves on to another window with start and end position shifted by the step size.  It is usually reasonable to set this to 10% of the window size.

Christopher Chang

unread,
Nov 28, 2014, 10:47:50 AM11/28/14
to plink2...@googlegroups.com
Correction: my 20000 window size recommendation was for a r^2 threshold of 0.5.  With a threshold of 0.9, 4000 should be fine (there shouldn't be too many pairs of further-apart variants which are that tightly correlated).

Jessica May

unread,
Nov 28, 2014, 1:04:34 PM11/28/14
to Christopher Chang, plink2...@googlegroups.com

Thank you sooo much Christopher.

I am very clear now what is happening.

But, I am not clear about one thing. You mentioned, bigger window is better, but in my crop (maize) LD decays so fast that I wanted to use very small window size, say 50 SNPs.

I have seen few papers mentioned using a r^2 as .5.

This all makes me very confused. If I take r^2 .5 I think I will discard so many markers.

 

Pl advise.

 

Regards

May


--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Chang

unread,
Nov 28, 2014, 1:39:50 PM11/28/14
to plink2...@googlegroups.com, chrch...@gmail.com
This depends on how many markers you're starting with and what kind of analyses you want to perform on the LD-pruned data.  "50 5 0.9" could be reasonable if you have less than 50000 markers per chromosome.


On Friday, November 28, 2014 10:04:34 AM UTC-8, Jessica May wrote:

Thank you sooo much Christopher.

I am very clear now what is happening.

But, I am not clear about one thing. You mentioned, bigger window is better, but in my crop (maize) LD decays so fast that I wanted to use very small window size, say 50 SNPs.

I have seen few papers mentioned using a r^2 as .5.

This all makes me very confused. If I take r^2 .5 I think I will discard so many markers.

 

Pl advise.

 

Regards

May

On Fri, Nov 28, 2014 at 9:17 PM, Christopher Chang wrote:
Correction: my 20000 window size recommendation was for a r^2 threshold of 0.5.  With a threshold of 0.9, 4000 should be fine (there shouldn't be too many pairs of further-apart variants which are that tightly correlated).


On Friday, November 28, 2014 7:44:29 AM UTC-8, Christopher Chang wrote:
Suppose plink.bed/.bim/.fam contains 3000000 variants, all on one chromosome.  If you run

"plink --bfile plink --indep-pairwise 20000 2000 .9"

then plink will first look at variants #1-20000, and prune them until no pair has r^2 > 0.9.  Then it will do the same for variants #2001-22000, then #4001-24000, and so on until #2980001-3000000.

The "window size" is the number of variants plink considers at a time.  Bigger is better, but also slower.  For the 1000 Genomes phase 1 variant set (~40 million total), I think 20000 is a reasonable window size if you have some time to spare; with lower-density data, or a tight time constraint, you can scale down the window size accordingly.

After it finishes one window, plink moves on to another window with start and end position shifted by the step size.  It is usually reasonable to set this to 10% of the window size.

On Thursday, November 27, 2014 8:21:04 PM UTC-8, Jessica May wrote:
Hi All,
I am using following command to do LD based pruning.

 plink --bed plink.bed --fam plink.fam --bim plink.bim  --indep-pairwise  2 2 .9

Can any body please help me to understand
What is a Window Size: [window size]<kb>
What is a Step Sige : [step size (variant ct)]

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages