Plink2, SNPs in inter-chromosomal LD and low-complexity regions

727 views
Skip to first unread message

hamidi

unread,
Nov 1, 2021, 2:04:13 PM11/1/21
to plink2-users
Hi Chris

--r2 inter-chr is deprecated in plink2. I was wondering which flags I should use in plink 2 to filter out SNPs which are presented in low-complexity regions and SNPs which are involved in inter-chromosomal LD. 

Thanks


Christopher Chang

unread,
Nov 2, 2021, 9:11:36 PM11/2/21
to plink2-users
"--r2 inter-chr" isn't "deprecated in plink2".  Instead, --r2 hasn't been implemented yet, so you should use plink 1.9 for this operation for now.

hamidi

unread,
Nov 4, 2021, 5:23:21 PM11/4/21
to plink2-users
Thanks Chris, 
Following your recommendation, I converted bgen file in UK Biobank to VCF file with plink 2. Now, I am using plink 1.9 to calculate LD pruned SNPs from VCF files with this command:

 plink  --vcf CHR1.vcf.gz --indep-pairwise 1000 100 0.9 --r2 inter-chr --out CHR1_LDSNPs

However, plink 1.9 finishes --indep-pairwise step successfully but it is stuck in --r2 inter-chr stage as you can see here. Any suggestion?

Pruned 359265 variants from chromosome 1, leaving 98344.
Pruning complete.  359265 of 457609 variants removed.
Marker lists written to
and
CHR1_LDSNPs.prune.out
.
--r2 inter-chr to
CHR1_LDSNPs.ld
... 0% [processing]


Thanks
 







Christopher Chang

unread,
Nov 4, 2021, 8:50:59 PM11/4/21
to plink2-users
--r2 inter-chr is expected to take a long time.

With that said, you are not using --indep-pairwise correctly.  While it generates a LD-pruned SNP list, it does *not* exclude the rest of the SNPs from the *current* run.  Instead, you are supposed to use --extract on that list in a subsequent plink run.  E.g.

plink2 --vcf CHR1.vcf.gz --indep-pairwise 1000 100 0.9 --make-bed --out CHR1_plink
plink --bfile CHR1_plink --extract chr1_plink.prune.in --r2 inter-chr --out CHR1_LDSNPs

Christopher Chang

unread,
Nov 4, 2021, 8:51:31 PM11/4/21
to plink2-users
(oops, replace "chr1_plink.prune.in" with "CHR1_plink.prune.in")

hamidi

unread,
Nov 8, 2021, 3:26:42 PM11/8/21
to plink2-users
Hi Chris.
1- I added --ld-window-r2 0.8 to your LD pruning commands, ran it on 22th chromosome and I got CHR22_plink.prune.ld file. However, as you see here, this file only has R2 information for pair of SNPs in chromosome 22 but not R2 information between SNPs in chromosome 22 and SNPs in other 21 chromosomes. Is this ld file correct? 
2- To have data with LD pruned SNPs, I think I need to run another command to extract SNPs in chr1_plink.prune.in and exclude SNPs in third column of CHR22_plink.prune.ld.

Thanks 

Screen Shot 2021-11-08 at 2.16.12 PM.png

hamidi

unread,
Nov 8, 2021, 3:29:36 PM11/8/21
to plink2-users
(my apology, please replace "CHR22_plink.prune.ld " with "CHR22_LDSNPs.ld " in my email)

Christopher Chang

unread,
Nov 9, 2021, 10:03:37 PM11/9/21
to plink2-users
Why would you expect a --r2 run with only chr22 as input to contain interchromosomal chr21-chr22 correlations?  Where on earth is that run supposed to get the chr21 information from?

hamidi

unread,
Nov 10, 2021, 12:33:04 AM11/10/21
to plink2-users
Thanks Chris for your reply. I am a new user and your point was what confused me in first place. To calculate inter-chromosomal correlations,  I could not find any flag or anyway to feed all 22 chromosomes as input files in previous related questions and document.That is why I thought plink would expect to find other 21 chromosomes in same directory and would calculate inter-chromosomal correlations between selected chromosome and other 21 chromosomes. Anyway, how can I calculate inter-chromosomal correlations between one chromosome and all other 21 chromosomes? Would appreciate your input. 

Thanks

Christopher Chang

unread,
Nov 10, 2021, 1:31:30 AM11/10/21
to plink2-users
"--r2 inter-chr" (as well as quite a few other plink commands) is supposed to be used with a merged dataset.

As a practical matter, you should prune with --indep-pairwise/--extract first so that the computation isn't hopelessly long.

hamidi

unread,
Nov 13, 2021, 7:08:05 PM11/13/21
to plink2-users
Thanks Chris. I followed your practical direction, I am using 36 CPUs, 36 threads and 1TB RAM and my merged dataset across 22 chromosomes has 400k subjects and 1.2 million LD pruned SNPs. However, last set is to run --r2 inter-chr step and it has had zero progress since 2 days ago. I was wondering if plink 1.9 supports GPU and/or processing on multiple CPU nodes? What do you recommend?

Command:
plink --threads 36 --memory 1440000 --bfile merged_ukb --r2 inter-chr --ld-window-r2 0.9 --out merged_ukb_LDSNPs


Output:
401967 people (184246 males, 217721 females) loaded from .fam.
Using up to 36 threads (change this with --threads).
Before main variant filters, 401967 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.996398.
1283748 variants and 401967 people pass filters and QC.
Note: No phenotypes present.
--r2 inter-chr to
merged_ukb_LDSNPs.ld
... 0% [processing]


Thanks   


Christopher Chang

unread,
Nov 13, 2021, 7:32:49 PM11/13/21
to plink2-users
1. If I were in your shoes, I would have been MUCH more aggressive with LD-pruning.  Do you really need ~1.3 million variants for what you're doing?
2. --r2 computations can be subdivided across multiple nodes with --parallel.
3. As a last resort, you can also randomly prune samples with e.g. --thin-indiv ; this effectively reduces the precision of the computed correlations.  However, the computational cost is linear in the number of samples and quadratic in the number of variants, so I'd first try to figure out the smallest number of variants I could get away with.
Reply all
Reply to author
Forward
0 new messages