association tests with multi-allelic variants from VCF

317 views
Skip to first unread message

Eric Karlins

unread,
Jun 20, 2023, 9:41:58 AM6/20/23
to plink2-users
Good morning!

I'm wondering how Plink/1.9 handles association tests for multi-allelic variants loaded from a VCF file.

I have a variant where ALT="CG,*,CCCG,G"

I ran a command that looks like this (using plink version 1.90-x86_64-beta):

"plink --vcf {input.vcf} --double-id --logistic --pheno {output.pheno} --adjust --allow-no-sex --memory 40000 --out {params.out_str} --threads {threads} --covar {output.pheno} --covar-name {covariates}"

I get an association result for this variant, but it's just one result. Not one for each ALT allele. How is it handling this? Is it best to split multi-allelic variants before running logistic regression?

Thanks!
Eric

Christopher Chang

unread,
Jun 20, 2023, 12:15:55 PM6/20/23
to plink2-users
Plink 1.9 does not have proper support for multiallelic variants; it will only keep the most frequent ALT allele, genotypes involving the less common ALT alleles will be set to missing.

You should use a recent Plink 2.0 build if you want to perform association analysis on multiallelic variants; this is better than splitting such variants.

Eric Karlins

unread,
Jun 20, 2023, 12:29:48 PM6/20/23
to plink2-users
Thanks! I can try it with a recent Plink 2.0 build. How does Plink 2.0 handle multiallelic variants? Does it perform one test for each ALT? Why is it better than splitting them?

Christopher Chang

unread,
Jun 20, 2023, 12:42:48 PM6/20/23
to plink2-users
From the --glm documentation:

(G is the predictor matrix in the regression)
"for multiallelic variants, G normally contains one column for each nonmajor[2] allele. 'omit-ref' changes this to one column for each ALT allele.
If some but not all of these allele columns are constant, the constant columns are omitted. (Before 20 Mar 2020, the entire variant was skipped in this case.)
For each such variant, the main report normally contains one line for each nonmajor allele"

In other words, with Plink 2.0, additional minor alleles are effectively included as additional covariates in each regression.  If you split the variants first, they are just treated like copies of the major allele.

Eric Karlins

unread,
Jun 20, 2023, 1:17:48 PM6/20/23
to plink2-users
Thanks for this further explanation!
In some cases variants are grouped as a single multiallelic variant even though the minimal representation of the variants would place the variants at separate genomic positions. Is treating these as covariates really what we want here or would it make more sense to run these as separate tests?

Christopher Chang

unread,
Jun 22, 2023, 3:10:54 AM6/22/23
to plink2-users
If it's only possible to have one variant or the other, not both at the same time, then the multiallelic variant representation should be slightly more effective here.
Reply all
Reply to author
Forward
0 new messages