merging bgen file UK Biobank

1,612 views
Skip to first unread message

Qida He

unread,
Mar 20, 2022, 9:20:28 AM3/20/22
to plink2-users
Hi Christophe! For merging bgen files from UK Biobank, i started by converting bgen to pgen. And then I  used pmerge to merge chr21 and chr22 but failed. I tried lots of ways. Please tell me what's wrong and how I need to merge the ukbiobank files.
thank you.
Qida He

Christopher Chang

unread,
Mar 20, 2022, 9:30:51 AM3/20/22
to plink2-users
How am I supposed to tell what you did wrong if you don't provide full .log files of what you did?

Qida He

unread,
Mar 21, 2022, 8:30:04 AM3/21/22
to plink2-users
Sorry,here is the log:
PLINK v2.00a3 64-bit (17 Mar 2022)
Options in effect:
  --make-pgen
  --merge-max-allele-ct 2
  --out 21_22
  --pmerge-list list1.txt

Hostname: DESKTOP-5344V79
Working directory: E:\ukbgwas\impbgen
Start time: Sun Mar 20 21:09:18 2022

Random number seed: 1647781758
130719 MiB RAM detected; reserving 65359 MiB for main workspace.
Using up to 48 threads (change this with --threads).
--pmerge-list: 2 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to 21_22-merge.psam .
--pmerge-list: 2 .pvar files scanned.
Concatenation job detected.
Concatenating...
Error: Conflicting REF alleles for variant 'rs566699515' at 21:10212396.


End time: Sun Mar 20 21:09:30 2022

Christopher Chang

unread,
Mar 21, 2022, 11:07:46 AM3/21/22
to plink2-users
If multiple variants have the same ID, --pmerge[-list] requires their REF alleles to match; "--merge-max-allele-ct 2" does not get around this.

You'll either need to rename your variants (with e.g. --set-all-var-ids) to address this issue, or export to BCF and fall back on "bcftools concat" to perform this merge.

Christopher Chang

unread,
Mar 21, 2022, 11:18:49 AM3/21/22
to plink2-users
Correction, "same ID" should read "same ID and position".

Qida He

unread,
Mar 22, 2022, 10:26:06 AM3/22/22
to plink2-users
I tried the --set-all-var-ids , but there was still the same error.
Here is the log:
 $ plink2 --set-all-var-ids @:# --pmerge-list list1.txt --make-pgen --out 21_22 --merge-max-allele-ct 2
PLINK v2.00a3 64-bit (17 Mar 2022)             www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to 21_22.log.

Options in effect:
  --make-pgen
  --merge-max-allele-ct 2
  --out 21_22
  --pmerge-list list1.txt
  --set-all-var-ids @:#

Start time: Tue Mar 22 22:20:09 2022

130719 MiB RAM detected; reserving 65359 MiB for main workspace.
Using up to 48 threads (change this with --threads).
--pmerge-list: 2 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to 21_22-merge.psam .
--pmerge-list: 2 .pvar files scanned.
Concatenation job detected.
Concatenating... 0/2500126 variants complete.

End time: Tue Mar 22 22:20:24 2022

Error: Conflicting REF alleles for variant 'rs566699515' at 21:10212396.

Christopher Chang

unread,
Mar 22, 2022, 8:55:13 PM3/22/22
to plink2-users
There are two problems here.

1. The --set-all-var-ids template you chose doesn't solve the problem: you'll STILL have two same-ID variants at chr21:10212396 with different REF alleles.  You have to add at least the REF allele to the --set-all-var-ids template.
2. --set-all-var-ids happens after --pmerge-list in the order of operations.  So you need to perform --set-all-var-ids + --make-pgen runs first, and then run --pmerge-list on the output files from that.

Qida He

unread,
Mar 24, 2022, 8:46:18 AM3/24/22
to plink2-users
Very effective, thank you for your answer!You're so nice.

Jalil Sharif

unread,
Mar 24, 2022, 10:02:41 AM3/24/22
to plink2-users
What was the final command pipeline you used Qida He?

Qida He

unread,
Mar 25, 2022, 6:55:36 AM3/25/22
to plink2-users
Firstly, i uesd the following command to rename the same-id variants,
  --bgen ukb22828_c1_b0_v3.bgen ref-first
  --make-bpgen
  --new-id-max-allele-len 100 missing
  --out c1
  --sample ukb22828_c1_b0_v3_s487257.sample
  --set-all-var-ids @:#[ukb]\$r
And then i used pmerge-list to merge all files.

Miao Cai

unread,
May 23, 2022, 4:08:16 AM5/23/22
to plink2-users

Is ref-first the right parameter for reading .bgen files by the UK Biobank? This seems to cause a mismatch of reference allele with the results read by the R package bigsnpr. bigsnpr reads SNP reference alleles directly from .bgi index files, so I'm assuming that bigsnpr reads the correct references, and therefore I'm assuming this plink parameter "ref-first" is probably not right?

See the github issue for my plink2 and R code, as well as the outputs: https://github.com/privefl/bigsnpr/issues/339

Christopher Chang

unread,
May 23, 2022, 10:47:44 AM5/23/22
to plink2-users
Reread the bigsnpr output: it only refers to "allele1" and "allele2", not "REF"/"ALT".

The original convention for plink .bim files was to store minor alleles in "allele1" and major alleles in "allele2".  Since the reference allele is usually major, the plink2 convention is to store REF=allele2.

Miao Cai

unread,
May 23, 2022, 11:20:37 PM5/23/22
to plink2-users
Ok. Thanks Chris. Just to confirm my understanding, so the allele 2 is the reference allele (major allele) in plink2, is this correct?

Ahmed salih

unread,
Jul 25, 2022, 12:32:06 PM7/25/22
to plink2-users
Hi Christophe,

I am still  facing same issue as stated above.
I first applied
  --set-all-var-ids
then when I run ,
 --pmerge-list
I got the error below

Error: The biallelic variants with ID 'rs151120166' at position 1:715142 in
/workspace/home/asalih/extracted/ch1setvar.pvar appear to be the components of
a 'split' multiallelic variant; if so, it must be 'joined' (with e.g. "bcftools
norm -m") before a correct merge can occur. If you are SURE that your data does
not contain any same-position same-ID variant groups that should be joined, you
can suppress this error with --multiallelics-already-joined.

I tried to add --multiallelics-already-joined, but then I got the error 
--pmerge[-list] multiallelic-variant dosage support is under development.

Any idea how to fix it please?

best

Christopher Chang

unread,
Jul 25, 2022, 12:40:40 PM7/25/22
to plink2-users
Why did you add --multiallelics-already-joined??!  You're LUCKY that --pmerge-list is unfinished.  Please reread the first error message.

Ahmed salih

unread,
Jul 25, 2022, 12:54:23 PM7/25/22
to plink2-users
Thanks so much.

Yes I followed your reply and applied 

--set-all-var-ids
then when I run ,
 --pmerge-list
I got the error below

Error: The biallelic variants with ID 'rs151120166' at position 1:715142 in
/workspace/home/asalih/extracted/ch1setvar.pvar appear to be the components of
a 'split' multiallelic variant; if so, it must be 'joined' (with e.g. "bcftools
norm -m") before a correct merge can occur. If you are SURE that your data does
not contain any same-position same-ID variant groups that should be joined, you
can suppress this error with --multiallelics-already-joined.


Nik Tz

unread,
Sep 11, 2023, 9:05:37 AM9/11/23
to plink2-users
Hello, 

Thanks for the above discussion. Unfortunately I am still not clear what set of commands to run for PLINK to merge a set of .bgen files. Please could someone write this out for me?

Much appreciated,
Nik

Christopher Chang

unread,
Sep 11, 2023, 9:30:53 AM9/11/23
to plink2-users
For generic .bgen merge, you need to use a .bgen-based tool like QCTOOL v2.  PLINK is only relevant if your downstream analyses only require dosages and not genotype probabilities; if this is the case, you should (i) explicitly say so in your question and (ii) explain why you still need a .bgen rather than a .pgen file on the other end.

Nik Tz

unread,
Sep 13, 2023, 5:52:13 PM9/13/23
to plink2-users
Thanks for the fast clarification, Chris. I will look into qctool for this purpose.

Huy Nguyen

unread,
Sep 15, 2023, 1:08:22 AM9/15/23
to plink2-users
In the second line of your code, please confirm --make-bgen or --make-pgen as I got confused, Qida He?

Vào lúc 20:55:36 UTC+10 ngày Thứ Sáu, 25 tháng 3, 2022, heqis...@gmail.com đã viết:
Reply all
Reply to author
Forward
0 new messages