How to deal with multiallelic variants while performing --r2-unphased on1KG data

33 views
Skip to first unread message

Hammad Farooq

unread,
Jun 27, 2024, 10:11:33 AMJun 27
to plink2-users
I have a GRCh38 GWAS dataset and I want to identify SNPs that are in tight genetic linkage with GWAS lead SNPs. To achieve this, I downloaded the 1000 Genomes variant call files from "2022-08-04 Byrska-Bishop et al. (build 38, 3202 samples, contigs unphased)" available at this link.

I ran the following command to compute linkage disequilibrium (LD) using PLINK 2:

plink2 --pfile all_hg38 --r2-unphased --ld-window-r2 0 --ld-window 999999 --ld-window-kb 2000 --out all_hg38_ld

However, I encountered the following error:

Error: --r2-unphased column-set doesn't include allele columns which clarify
which calculation is being performed at multiallelic variants. Either filter
out multiallelic variants, revise the column-set (with e.g. "cols=+maj"), or
use the 'allow-ambiguous-allele' modifier to override this error.

Do I need to Filter out multiallelic variants?

Any other suggestions about the current strategy of identifying the SNPs that are in tight genetic linkage with GWAS lead SNPs are also welcome.

Thanks


Chris Chang

unread,
Jun 27, 2024, 10:28:48 AMJun 27
to Hammad Farooq, plink2-users
“Revise the column-set” is a solution that lets you keep these variants.  See 

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/7cf4c463-86d2-4ed8-a5b3-62642b9686dan%40googlegroups.com.

Hammad Farooq

unread,
Jun 27, 2024, 1:21:52 PMJun 27
to plink2-users

Thank you so much for your quick response.

A naive question: in this context, does it make more sense to use --r-phased or --r-unphased?

Chris Chang

unread,
Jun 27, 2024, 9:38:23 PMJun 27
to Hammad Farooq, plink2-users
Not a naive question.

In this context, you have phased genotype data, and are restricting the computation to a limited range where phase-switch errors won’t make the phasing useless, so I would lean to —r-phased unless you need to compare with other unphased stats.

Reply all
Reply to author
Forward
0 new messages