Question about multiallelic variants in plink 1.9 and 2

4,365 views
Skip to first unread message

Pietro della Briotta Parolo

unread,
Jan 15, 2020, 5:35:22 AM1/15/20
to plink2-users
Hi.

I'm currently writing a simple vcf to bed conversion pipeline on cromwell. I need to use a mix of plink1.9 and plink 2 since the multithreading comes in handy, but plink2 doesn't support --merge-list. 

ATM i'm testing with the 1kg data and there's something that is not quite clear to me.

In the conversion step I specify --max-alleles 2 


PLINK v2
.00a2LM AVX2 Intel (13 Dec 2019)       www.cog-genomics.org/plink/2.0/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to 21.log.
Options in effect:
 
--allow-extra-chr
 
--make-bed
 
--max-alleles 2
 
--memory 30000
 
--out 21
 
--vcf /cromwell_root/thousand_genome/vcf/chrom/ALL.chr21_GRCh38.genotypes.20170504.vcf.gz
 
--vcf-half-call h


Start time: Wed Jan 15 12:12:00 2020
32167 MiB RAM detected; reserving 30000 MiB for main workspace.
Using up to 32 threads (change this with --threads).
--vcf: 1104028 variants scanned.
--vcf: 21-temporary.pgen + 21-temporary.pvar + 21-temporary.psam written.
2504 samples (0 females, 0 males, 2504 ambiguous; 2504 founders) loaded from
21-temporary.psam.
1097776 out of 1104028 variants loaded from 21-temporary.pvar.
Note: No phenotype data present.
1097776 variants remaining after main filters.
Writing 21.fam ... done.
Writing 21.bim ... done.
Writing 21.bed ... done.
End time: Wed Jan 15 12:12:11 2020

Same for chrom 20.
1802302 out of 1811146 variants loaded from 20-temporary.pvar.
Note: No phenotype data present.
1802302 variants remaining after main filters.


Obviously, in the merging step, plink 1.9 complains about multiallelic variants. 
PLINK v1.90b6.13 64-bit (30 Nov 2019)          www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to 1k.log.
Options in effect:
 
--allow-extra-chr
 
--make-bed
 
--memory 30000
 
--merge-list merge_list.txt
 
--out 1k


32170 MB RAM detected; reserving 30000 MB for main workspace.
Error: 147 variants with 3+ alleles present.
* If you believe this is due to strand inconsistency, try --flip with
 
1k-merge.missnp.
 
(Warning: if this seems to work, strand errors involving SNPs with A/T or C/G
  alleles probably remain
in your data.  If LD between nearby SNPs is high,
 
--flip-scan should detect them.)
* If you are dealing with genuine multiallelic variants, we recommend exporting
  that subset of the data to VCF
(via e.g. '--recode vcf'), merging with
  another tool
/script, and then importing the result; PLINK is not yet suited
  to handling them
.



However, even if I add the `--biallelic-only strict ` flag I still get the same error.
PLINK v1.90b6.13 64-bit (30 Nov 2019)          www.cog-genomics.org/plink/1.9/
(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to 1k.log.
Options in effect:
 
--allow-extra-chr
 
--biallelic-only strict
 
--make-bed
 
--memory 30000
 
--merge-list merge_list.txt
 
--out 1k


32170 MB RAM detected; reserving 30000 MB for main workspace.
Error: 147 variants with 3+ alleles present.

What am I missing and what is the best way to proceed?

Thanks

Christopher Chang

unread,
Jan 15, 2020, 11:17:29 AM1/15/20
to plink2-users
I'm guessing this is caused by multiple variants having the same ID.  plink2's --set-all-var-ids flag and --rm-dup command should help you address this.

Alternatively, if you're only using --merge-list for concatenation purposes, you can use "bcftools concat" for that part; plink2 can then import the concatenated VCF.

Pietro della Briotta Parolo

unread,
Jan 15, 2020, 1:18:25 PM1/15/20
to plink2-users
Yeah. I forgot 1kg uses rsids and that causes issues! Thanks!

RGS

unread,
Apr 21, 2020, 5:17:47 AM4/21/20
to plink2-users
Hi Pietro,
I'm facing the same issue while combining binary files with rsids. Could you please elaborate why does rsids cause issue? I have 560 variatns with 3+ alleles. How do I sort it?

Thanks,
Ravi

Graeme Ford

unread,
Oct 9, 2020, 4:10:34 AM10/9/20
to plink2-users
To add to this, for my work I have used plink-2 to rename all my variants with non-rsid based naming structure and it still kicks up the error. Surely `--biallelic-only strict` should just filter out any tri (or more) allelic regions indescriminately during a merge or am I misunderstanding how it works?

``` bash
PLINK v1.90b4.9 64-bit (13 Oct 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to Intermediates/COLLATE/ALL.log.

Options in effect:
  --allow-extra-chr
  --biallelic-only strict
  --chr 1-22
  --keep-allele-order
  --make-bed
  --merge-list <Binary Merge List text File>
  --out final
```

Christopher Chang

unread,
Oct 9, 2020, 11:11:51 AM10/9/20
to plink2-users
No, the --biallelic-only flag only applies to VCF import.  You need to follow the more convoluted procedure described at https://www.cog-genomics.org/plink/1.9/data#merge3 ; sorry about the inconvenience.
Reply all
Reply to author
Forward
0 new messages