Plink 1.9b - Estimation of LD blocks from 1000g ph1_v3 data, a couple of questions

1,051 views
Skip to first unread message

trypdal

unread,
Oct 24, 2014, 7:33:47 AM10/24/14
to plink2...@googlegroups.com
Dear Plink developers,

I am a new Plink user and I am posting here to make sure I'm utilising your excellent tool correctly.

I would like to estimate LD blocks for the CEU and YRI panels in the phase1_release_v3.20101123 1000g data. I am aware that common tools like SNAP and HAPLOREG have done this, however they have only used the old 1000g pilot data hence my interest in manual estimation.

I have sliced the original vcf by chromosome and population, only retaining chr1-22 and X and CE/YRI populations. I have obtained plink bed, bim and fam files for these vcfs. Entries in the vcf which do not satisfy the plink filters --maf 0.002 --hwe 0.001 were discarded.

For comparison with the SNAP LD blocks, I am interested in segments no longer than 500KB. Based on the SNAP documentations, I understand this is a meaningful block threshold because LD signficantly decreases past 500kb. I have used the following plink1.9b command line:

plink --blocks
no-pheno-req
--blocks-max-kb 500
--blocks-strong-lowci 0.7005

This should use the default "strong LD" confidence interval thresholds (0.7005 and 0.98) as recommended initially in the Gabriel et al. 2002 paper.

I ran this, and computation completed within 1h or so on my hardware (which is really amazing so thank you so much for developing v.1.9b!). I have then converted the *.block.det output to bed files.

My questions are:
1- is this a sensible pipeline to obtain LD blocks or am I missing something obvious?
2-would you be so kind to clarify what the <no-small-max-span> option actually does? When do you recommend using it?
3-chrX - would you recommend any kind of different processing given the know recombination differences compared to autosomes?

Best wishes



Christopher Chang

unread,
Oct 24, 2014, 1:44:23 PM10/24/14
to plink2...@googlegroups.com
1. Yes, this is reasonable.
2. Haploview does not call two-variant and three-variant blocks unless the variants are within 20k or 30k bp of each other, respectively.  If you don't have many variants in your dataset, you may want to lift this restriction, but with 1000g there's no need.
3. I'm not an expert on this.  If others reach a consensus on how the X chromosome should be handled differently, though, I will try to implement it.

trypdal

unread,
Nov 26, 2014, 6:45:21 AM11/26/14
to plink2...@googlegroups.com
H Christopher

thanks for your replies. I have a further question. How do I estimate the LD blocks, using the method above, but based on an r^2 definition of LD block? I.e. I now wish to look at less stringent LD block definition for say r^2 > 0.5. Is this doable with plink --blocks?

Christopher Chang

unread,
Nov 26, 2014, 1:05:26 PM11/26/14
to plink2...@googlegroups.com
No, --blocks is always based on D' confidence intervals.  However, you can write a custom script that postprocesses the output of --r2.

Giuseppe G

unread,
Nov 26, 2014, 1:52:17 PM11/26/14
to Christopher Chang, plink2...@googlegroups.com
Hi Christopher thanks a lot, could you recommend the correct way of doing this? I have generated the following output with plink -bfile <infile.bed> --r2 inter-chr gz dprime yes-really

 CHR_A         BP_A                 SNP_A  CHR_B         BP_B                 SNP_B           R2           DP
    22     16050408           rs149201999     22     16050612           rs146752890     0.618693     0.851012
    22     16050408           rs149201999     22     16050678           rs139377059     0.904168     0.974616
    22     16050408           rs149201999     22     16051107             rs6518357      0.88394     0.951471
    22     16050408           rs149201999     22     16051480            rs79725552      0.88394     0.951471
    22     16050408           rs149201999     22     16051882             rs2843213     0.859901     0.927308
    22     16050408           rs149201999     22     16052250             rs4965019     0.358184     0.762941
    22     16050408           rs149201999     22     16052656             rs2843217     0.835935     0.925512
    22     16050408           rs149201999     22     16052838             rs2105538      0.88394     0.951471
...

Let's say I'm interested in a r^2 > 0.5 threshold, would it be correct to just convert the above file to bed and do a "bedtools merge" for all the intervals passing the R2 > 0.5 threshold?

--
You received this message because you are subscribed to a topic in the Google Groups "plink2-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/plink2-users/smT7-_i1ERQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christopher Chang

unread,
Nov 26, 2014, 2:07:52 PM11/26/14
to plink2...@googlegroups.com
* Inter-chromosome correlations aren't of interest when generating LD blocks, so you should be able to drop the 'inter-chr'.

* You'd need to do substantially more than "bedtools merge".  Gabriel et al.'s method defines *three* categories ("strong LD", "strong evidence for recombination", and "uninformative"), and uses a greedy algorithm to construct locally maximal intervals where "strong LD" variant pairs outnumber "strong evidence for recombination" variant pairs by at least 19 to 1.  You'd have to decide how this translates over to r^2 thresholds.

Honestly, I think it's a better idea to stick to --blocks.  If you really want to "roll your own", the simplest solution I can think of is a combination of --indep-pairwise (each retained variant can define the center of its own block) and --show-tags (to fill out the blocks).


On Wednesday, November 26, 2014 10:52:17 AM UTC-8, trypdal wrote:
Hi Christopher thanks a lot, could you recommend the correct way of doing this? I have generated the following output with plink -bfile <infile.bed> --r2 inter-chr gz dprime yes-really

 CHR_A         BP_A                 SNP_A  CHR_B         BP_B                 SNP_B           R2           DP
    22     16050408           rs149201999     22     16050612           rs146752890     0.618693     0.851012
    22     16050408           rs149201999     22     16050678           rs139377059     0.904168     0.974616
    22     16050408           rs149201999     22     16051107             rs6518357      0.88394     0.951471
    22     16050408           rs149201999     22     16051480            rs79725552      0.88394     0.951471
    22     16050408           rs149201999     22     16051882             rs2843213     0.859901     0.927308
    22     16050408           rs149201999     22     16052250             rs4965019     0.358184     0.762941
    22     16050408           rs149201999     22     16052656             rs2843217     0.835935     0.925512
    22     16050408           rs149201999     22     16052838             rs2105538      0.88394     0.951471
...

Let's say I'm interested in a r^2 > 0.5 threshold, would it be correct to just convert the above file to bed and do a "bedtools merge" for all the intervals passing the R2 > 0.5 threshold?
To unsubscribe from this group and all its topics, send an email to plink2-users+unsubscribe@googlegroups.com.

Giuseppe G

unread,
Nov 27, 2014, 10:02:32 AM11/27/14
to Christopher Chang, plink2...@googlegroups.com
Hi Christopher, thanks for this. How would you suggest using -blocks with relaxed D' confidence intervals, in order to approximate an r^2 on the order of 0.5 minimum?

The Gabriel et al paper defines a pair of SNPs to be in "strong LD" when the one-sided uper 95% confidence bound on D' is > 0.98. Strong evidence for recombination is defined for the same confidence bound at < 0.9. Would a --blocks-strong-highci 0.92 result in larger blocks, containing SNPs in "weak" LD?

To unsubscribe from this group and all its topics, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Nov 27, 2014, 3:16:09 PM11/27/14
to plink2...@googlegroups.com, chrch...@gmail.com
I haven't looked in detail at how r^2 and D' covary, but yes, you should be able to achieve what you want by changing the --blocks-strong-lowci and --blocks-strong-highci values.


On Thursday, November 27, 2014 7:02:32 AM UTC-8, trypdal wrote:
Hi Christopher, thanks for this. How would you suggest using -blocks with relaxed D' confidence intervals, in order to approximate an r^2 on the order of 0.5 minimum?

The Gabriel et al paper defines a pair of SNPs to be in "strong LD" when the one-sided uper 95% confidence bound on D' is > 0.98. Strong evidence for recombination is defined for the same confidence bound at < 0.9. Would a --blocks-strong-highci 0.92 result in larger blocks, containing SNPs in "weak" LD?

JMR JMR

unread,
Oct 8, 2018, 4:38:10 AM10/8/18
to plink2-users
Hi trypdal,

Could you point me to a paper/documentation that says that LD significantly decreases past 500 kb?

Thanks,

Javier
Reply all
Reply to author
Forward
0 new messages