running GWAS with Plink

4,245 views
Skip to first unread message

Ana Marija

unread,
Jul 30, 2019, 1:34:21 PM7/30/19
to plink2-users
Hello,

I got my .bgen files from UKbiobank and I have my phenotype file (where I have subject ID and pheno column where case=2 and control=1).

I would like to use Plink to run GWAS.
As I understood the first step would be to convert my bgen and sample files to vcf with something like:
 plink –bgen [bgen_file] –sample [sample_file]

My question is how my .sample file should look like?

Thanks
Ana

Christopher Chang

unread,
Jul 30, 2019, 1:56:34 PM7/30/19
to plink2-users
1. The UK Biobank should provide .sample files.
2. If you try running that command, it'll tell you that the .bgen format version is not supported by plink 1.9.  You need to use plink 2.0 here.  "plink2 --bgen <bgen_file> ref-first --sample <sample_file>" should work.  (Note the "ref-first": the .bgen specification does not say whether REF alleles are first or second, so you need to tell plink2 that they appear first in UK Biobank .bgen files.)

Ana Marija

unread,
Jul 30, 2019, 2:03:23 PM7/30/19
to Christopher Chang, plink2-users
Yes I will use Plink2

I downloaded .fem files for each CHR and they look like this:

2743359 2743359 0 0 1 Batch_b001
3055474 3055474 0 0 2 Batch_b001
1804099 1804099 0 0 2 Batch_b001

How can I utilize those to create .sample file and how they should look on the end?
I know that I have to use 0 for control and 1 for case in pheno column.




--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/c7b02b04-81c7-4c53-8a7b-1c0712dd1891%40googlegroups.com.

Christopher Chang

unread,
Jul 30, 2019, 2:18:56 PM7/30/19
to plink2-users
Again, you should be able to download .sample files directly from the UK Biobank servers.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Richard Yanicky

unread,
Jul 30, 2019, 2:25:27 PM7/30/19
to Christopher Chang, plink2-users
I am working with Ukb data and plink2. Thank you to this group for all your help.

This document which is part of the UKb process and is part of users manual it lists details about the data with examples on how to download. I would have been lost without this.


plink2 is the core engine of our process .

Regards,

Richard

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/d16072b2-7ac2-4350-a957-042b13ea1432%40googlegroups.com.

Ana Marija

unread,
Jul 30, 2019, 3:00:56 PM7/30/19
to Richard Yanicky, Christopher Chang, plink2-users
Thank you so much. I did download my .samples via:
#!bin/bash
for i in {1..22};  do
  ukbgene imp -c${i} -m
done

and they look like this:

ID_1 ID_2 missing sex
0 0 0 D
2743359 2743359 0 1
3055474 3055474 0 2

I assume this is ready to run, and I don't need to add pheno column on the end of .sample files?



Christopher Chang

unread,
Jul 30, 2019, 3:36:24 PM7/30/19
to plink2-users
Correct, no phenotype information needs to be present for the initial import.


On Tuesday, July 30, 2019 at 12:00:56 PM UTC-7, Ana Marija wrote:
Thank you so much. I did download my .samples via:
#!bin/bash
for i in {1..22};  do
  ukbgene imp -c${i} -m
done

and they look like this:

ID_1 ID_2 missing sex
0 0 0 D
2743359 2743359 0 1
3055474 3055474 0 2

I assume this is ready to run, and I don't need to add pheno column on the end of .sample files?



On Tue, Jul 30, 2019 at 1:25 PM Richard Yanicky wrote:
I am working with Ukb data and plink2. Thank you to this group for all your help.

This document which is part of the UKb process and is part of users manual it lists details about the data with examples on how to download. I would have been lost without this.


plink2 is the core engine of our process .

Regards,

Richard

On Tue, Jul 30, 2019 at 11:18 AM Christopher Chang wrote:
Again, you should be able to download .sample files directly from the UK Biobank servers.

On Tuesday, July 30, 2019 at 11:03:23 AM UTC-7, Ana Marija wrote:
Yes I will use Plink2

I downloaded .fem files for each CHR and they look like this:

2743359 2743359 0 0 1 Batch_b001
3055474 3055474 0 0 2 Batch_b001
1804099 1804099 0 0 2 Batch_b001

How can I utilize those to create .sample file and how they should look on the end?
I know that I have to use 0 for control and 1 for case in pheno column.




On Tue, Jul 30, 2019 at 12:56 PM Christopher Chang wrote:
1. The UK Biobank should provide .sample files.
2. If you try running that command, it'll tell you that the .bgen format version is not supported by plink 1.9.  You need to use plink 2.0 here.  "plink2 --bgen <bgen_file> ref-first --sample <sample_file>" should work.  (Note the "ref-first": the .bgen specification does not say whether REF alleles are first or second, so you need to tell plink2 that they appear first in UK Biobank .bgen files.)

On Tuesday, July 30, 2019 at 10:34:21 AM UTC-7, Ana Marija wrote:
Hello,

I got my .bgen files from UKbiobank and I have my phenotype file (where I have subject ID and pheno column where case=2 and control=1).

I would like to use Plink to run GWAS.
As I understood the first step would be to convert my bgen and sample files to vcf with something like:
 plink –bgen [bgen_file] –sample [sample_file]

My question is how my .sample file should look like?

Thanks
Ana

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Ana Marija

unread,
Jul 31, 2019, 9:14:53 AM7/31/19
to Christopher Chang, plink2-users
HI Chris,

I was running this:
plink2 --bgen ukb_imp_chr22_v3.bgen ref-first --sample ukb44316_imp_chr22_v3_s487317.sample

and I am not getting any .vcf file, can you please tell me how my command should look like?

Thanks
Ana

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Ana Marija

unread,
Jul 31, 2019, 9:35:37 AM7/31/19
to Christopher Chang, plink2-users
I should add that I want the output to be in vcf format

would this be ok?

plink2 --bgen ukb_imp_chr22_v3.bgen --sample ukb44316_imp_chr22_v3_s487317.sample --out chr22
plink2 --pgen chr22.pgen --pvar chr22.pvar --psam chr22.psam --export vcf vcf-dosage=DS 

Ana Marija

unread,
Jul 31, 2019, 10:01:48 AM7/31/19
to Christopher Chang, plink2-users
when I was running this line:
plink2 --pgen chr22.pgen --pvar chr22.pvar --psam chr22.psam --export vcf vcf-dosage=DS

I got error:
Random number seed: 1564580134
128952 MiB RAM detected; reserving 64476 MiB for main workspace.
Using up to 28 threads (change this with --threads).
487409 samples (264314 females, 223003 males, 92 ambiguous; 487409 founders)
loaded from chr22.psam.
1255683 variants loaded from chr22.pvar.
Note: No phenotype data present.
--export vcf to plink2.vcf ...
Error: File write failure.

Christopher Chang

unread,
Jul 31, 2019, 10:25:03 AM7/31/19
to plink2-users
Do you have enough disk space/quota?

Ana Marija

unread,
Jul 31, 2019, 10:57:14 AM7/31/19
to Christopher Chang, plink2-users
I do, and I deleted a lot of it so it probably is not that
what else can be an issue?

Thanks

On Wed, Jul 31, 2019 at 9:25 AM Christopher Chang <chrch...@gmail.com> wrote:
Do you have enough disk space/quota?

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 11:10:28 AM7/31/19
to plink2-users
You might be running too old of a plink2 build. I can’t tell because you keep removing that information from the .log info you post; this just wastes your time and mine.

Ana Marija

unread,
Jul 31, 2019, 11:18:09 AM7/31/19
to Christopher Chang, plink2-users
sorry about that, these are my log files after running this:

I was using: plink/2.0

plink2 --bgen ukb_imp_chr22_v3.bgen --sample ukb44316_imp_chr22_v3_s487317.sample --out chr22
plink2 --pgen chr22.pgen --pvar chr22.pvar --psam chr22.psam --export vcf vcf-dosage=DS

chr22.log
PLINK v2.00a2 AVX2 (25 Jun 2018)
Options in effect:
  --bgen ukb_imp_chr22_v3.bgen
  --out chr22
  --sample ukb44316_imp_chr22_v3_s487317.sample

Hostname: cri16in001
Working directory: /gpfs/data/stranger-lab/anamaria/biobank
Start time: Wed Jul 31 08:28:10 2019

Random number seed: 1564579690

128952 MiB RAM detected; reserving 64476 MiB for main workspace.
Using up to 28 threads (change this with --threads).
--bgen: 1255683 variants detected, format v1.2.
487409 samples imported from .sample file to chr22.psam .
--bgen: chr22.pgen + chr22.pvar written.

plink2.log
PLINK v2.00a2 AVX2 (25 Jun 2018)
Options in effect:
  --export vcf vcf-dosage=DS
  --pgen chr22.pgen
  --psam chr22.psam
  --pvar chr22.pvar

Hostname: cri16in001
Working directory: /gpfs/data/stranger-lab/anamaria/biobank
Start time: Wed Jul 31 09:31:19 2019

Random number seed: 1564583479

128952 MiB RAM detected; reserving 64476 MiB for main workspace.
Using up to 28 threads (change this with --threads).
487409 samples (264314 females, 223003 males, 92 ambiguous; 487409 founders)
loaded from chr22.psam.
1255683 variants loaded from chr22.pvar.
Note: No phenotype data present.
--export vcf to plink2.vcf ...
Error: File write failure.

End time: Wed Jul 31 09:55:15 2019

On Wed, Jul 31, 2019 at 10:10 AM Christopher Chang <chrch...@gmail.com> wrote:
You might be running too old of a plink2 build.  I can’t tell because you keep removing that information from the .log info you post; this just wastes your time and mine.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 11:43:42 AM7/31/19
to plink2-users
1. That's more than a year old, so I'd switch to a newer plink2 build.
2. Are you sure you need an uncompressed VCF?  That'll require more than 4 terabytes.  "--export vcf bgz" generates a compressed VCF.
On Wed, Jul 31, 2019 at 10:10 AM Christopher Chang wrote:
You might be running too old of a plink2 build.  I can’t tell because you keep removing that information from the .log info you post; this just wastes your time and mine.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Ana Marija

unread,
Jul 31, 2019, 11:51:52 AM7/31/19
to Christopher Chang, plink2-users
Thank you do much, yes I will get newer version. Also can I include this dosage info like this?

plink2 --pgen chr22.pgen --pvar chr22.pvar --psam chr22.psam  --export vcf bgz vcf-dosage=DS

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/114a7ce6-8b09-43df-897a-4d4cf6fbf264%40googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 11:55:15 AM7/31/19
to plink2-users
Yes, that command line looks fine, though you can replace "--pgen ... --pvar ... --psam ..." with just "--pfile chr22".


On Wednesday, July 31, 2019 at 8:51:52 AM UTC-7, Ana Marija wrote:
Thank you do much, yes I will get newer version. Also can I include this dosage info like this?

plink2 --pgen chr22.pgen --pvar chr22.pvar --psam chr22.psam  --export vcf bgz vcf-dosage=DS

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Ana Marija

unread,
Jul 31, 2019, 4:27:36 PM7/31/19
to Christopher Chang, plink2-users
Thank you so much!

If there is any chance that I can compress the output of this command:

plink2 --bgen ukb_imp_chr22_v3.bgen --sample ukb44316_imp_chr22_v3_s487317.sample --out chr22

and put that compressed file in this command?

plink2 --pfile chr22  --export vcf bgz vcf-dosage=DS --out VCFchr22

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/899b0e35-298c-444d-9fc8-75c32c6e168b%40googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 4:51:56 PM7/31/19
to plink2-users
Technically yes, but it's not worth the trouble in your case since you can only do this for the .pvar and .psam files, and your .pgen is much, much larger.

On Wednesday, July 31, 2019 at 1:27:36 PM UTC-7, Ana Marija wrote:
Thank you so much!

If there is any chance that I can compress the output of this command:

plink2 --bgen ukb_imp_chr22_v3.bgen --sample ukb44316_imp_chr22_v3_s487317.sample --out chr22

and put that compressed file in this command?

plink2 --pfile chr22  --export vcf bgz vcf-dosage=DS --out VCFchr22

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 4:53:09 PM7/31/19
to plink2-users
(In the meantime, don't forget to add "ref-first" to the --bgen part of your command line if you want accurate REF/ALT alleles in your VCF.)

Ana Marija

unread,
Jul 31, 2019, 6:08:32 PM7/31/19
to Christopher Chang, plink2-users
Thanks!
I must admit I don't quite understand ref-list, do I need to put some specific file after that or just put ref-list like this?

plink2 --bgen ukb_imp_chr17_v3.bgen  ref-first --sample ukb44316_imp_chr17_v3_s487317.sample --out chr17



To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/a7eeded8-713d-4c00-bdc3-b0fd0607b7e8%40googlegroups.com.

Christopher Chang

unread,
Jul 31, 2019, 6:13:07 PM7/31/19
to plink2-users
This command-line should be fine.  The point is just to avoid getting REF/ALT backwards, since the .bgen specification does not say which allele is REF and which allele is ALT.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users+unsubscribe@googlegroups.com.

Ana Marija

unread,
Jul 31, 2019, 6:24:22 PM7/31/19
to Christopher Chang, plink2-users
Thank you so much!!!

To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/f0fa2b48-675e-4aa7-ac83-a413088c10e1%40googlegroups.com.

Ana Marija

unread,
Aug 1, 2019, 2:55:10 PM8/1/19
to Christopher Chang, plink2-users
Hi Christopher,

so I am thinking to I need to do any QC steps on these imputed .bgen files I downloaded from UKBiobank before running GWAS with plink2.
This is what I was thinking to do:

-remove related individuals
-remove non Europeans
-remove SNPs with minor allele freq < 0.001
-model using ancestry info

Do you think I have to do that and can you please tell me how I would do this in plink2?

Thanks
Ana

Ana Marija

unread,
Aug 1, 2019, 6:30:24 PM8/1/19
to Christopher Chang, plink2-users
One more question:

to deal with: remove SNPs with minor allele freq < 0.001

Would this command be ok?

plink2 --bgen ukb_imp_chr17_v3.bgen  ref-first --sample ukb44316_imp_chr17_v3_s487317.sample --maf 0.001 --make-bpgen --out chr17

or do I rather need there --make-pgen

where my next step is still the same, converting to VCF format

plink2 --pgen chr17.pgen --pvar chr17.pvar --psam chr17.psam  --export vcf bgz vcf-dosage=DS --out VCFchr17

Ana Marija

unread,
Aug 6, 2019, 9:45:10 AM8/6/19
to Christopher Chang, plink2-users
Hi Christopher,

I am still in the process of preparing my files for GWAS analysis:
This s what I did so far:

plink2 --bgen ukb_imp_chr17_v3.bgen  ref-first --sample ukb44316_imp_chr17_v3_s487317.sample --extract SNPsToExtract --make-bed --out ex17
plink2 --threads 8 --bim ex17.bim --fam ex17.fam --bed ex17.bed --maf 0.01 --geno 0.05 --hwe 0.000001  --make-bpgen --out chr17
plink2 --threads 8 --bim chr17.bim --fam chr17.fam --pgen chr17.pgen  --export vcf bgz vcf-dosage=DS --out VCFchr17

but after running the last step I got this warning:

No dosage data present.  DS field will not be exported.

Can you please advise me on this.

Thanks
Ana

Chris Chang

unread,
Aug 6, 2019, 10:47:21 AM8/6/19
to sokovic....@gmail.com, plink2-users
The problem is with "--make-bed" instead of --make-pgen/--make-bpgen in your first command.  .bed files cannot store dosages.

Ana Marija

unread,
Aug 6, 2019, 10:58:07 AM8/6/19
to Chris Chang, plink2-users
Thank you so much!

Also in plink1.9 for imputation quality I used --qual-threshold 0.3
would in plink2 the equivalent flag be: --mach-r2-filter 0.3 ?

Ana Marija

unread,
Aug 8, 2019, 5:39:37 PM8/8/19
to Chris Chang, plink2-users
Hi Chris,

after I created vcf files via:

plink2 --threads 8 --bgen ukb_imp_chr17_v3.bgen ref-first --sample
ukb44316_imp_chr17_v3_s487317.sample --extract extractTheseSNPs
--make-pgen --out ex17
plink2 --threads 8 --pgen ex17.pgen --psam ex17.psam --pvar ex17.pvar
--maf 0.01 --geno 0.05 --hwe 0.000001 --make-bpgen --out chr17
plink2 --threads 8 --bim chr17.bim --fam chr17.fam --pgen chr17.pgen
--export vcf bgz vcf-dosage=DS --out VCFchr17

Can I run GWAS on each CHR with:

plink --threads 8 --vcf VCFchr17.vcf.gz --pheno pheno-file.txt
--pheno-name my_pheno –out CHR17

Also in pheno-file.txt is it sufficient to have just these columns:

id my_pheno
1 1000017 1
2 1000025 -9
3 1000038 -9
4 1000042 -9
5 1000056 -9
6 1000074 -9
7 1000038 -9
8 1000127 2
9 1000690 2
10 1000711 2
11 1001431 2
12 1001710 -9

Thanks
Ana

Ana Marija

unread,
Aug 9, 2019, 2:41:49 PM8/9/19
to Chris Chang, plink2-users
Hi Chris,

sorry to bother you again, I would just like to confirm if running
GWAS like this is ok.

I created my vcf files via:

plink2 --threads 8 --bgen ukb_imp_chr6_v3.bgen ref-first --sample
ukb44316_imp_chr6_v3_s487317.sample --extract extractTheseSNPs
--make-pgen --out ex6
plink2 --threads 8 --pgen ex6.pgen --psam ex6.psam --pvar ex6.pvar
--maf 0.01 --geno 0.05 --hwe 0.000001 --make-bpgen --out chr6
plink2 --threads 8 --bim chr6.bim --fam chr6.fam --pgen chr6.pgen
--export vcf bgz vcf-dosage=DS --out VCFchr6

then to run GWAS I do:

plink2 --threads 8 --vcf VCFchr6.vcf.gz --glm --pheno pheno_M.txt
--pheno-name pheno –out FINchr6

pheno_M.txt looks like this:

id pheno
1 1000017 1
2 1000025 -9
3 1000038 -9
4 1000042 -9
5 1000056 -9
6 1000074 -9
7 1000038 -9
8 1000127 2

Thanks
Ana

Andres Diaz-Pinto

unread,
Sep 21, 2019, 10:52:49 PM9/21/19
to plink2-users
Hi Ana and Chris,


I just got the genetic data from the ukbiobank and started to understand it. I'd like to kindly ask you a couple of questions regarding that.
My plan is to carry out a GWAS using blood phenotype, but before I should understand more basic things. 
I know that, among other data, there are three main parts, the genotype calls (bed/bim/fam), the phased haplotypes (bgen, sample, bgi) and the impute genotype calls (bgen, sample, bgi).
Do you know from which data they got the imputed data? Did they get it from the genotype calls?
The same question for the phase haplotypes, did they obtain that data from the genotype calls? Why does the number of variants in the phased haplotypes and the genotype calls in each chromosome differ?
Do these questions make sense at all? :D

If I understand correctly, the data I should use for carrying out a GWAS is the  impute genotype calls, right?

I'll really appreciate a reply from you,


Thanks in advance!
> >>>>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.
> >>>>>>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/790c03fb-6167-434e-bad0-8c4f0650ff37%40googlegroups.com.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> >>>>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.
> >>>>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/114a7ce6-8b09-43df-897a-4d4cf6fbf264%40googlegroups.com.
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> >>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.
> >>>>>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/899b0e35-298c-444d-9fc8-75c32c6e168b%40googlegroups.com.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.
> >>>>>>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/a7eeded8-713d-4c00-bdc3-b0fd0607b7e8%40googlegroups.com.
> >>>>>>>
> >>>>>>> --
> >>>>>>> You received this message because you are subscribed to the Google Groups "plink2-users" group.
> >>>>>>> To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

Andres Diaz-Pinto

unread,
Sep 22, 2019, 7:42:23 PM9/22/19
to plink2-users
Hi Chris,


I'm trying to get a new bgen file with only a specific number of individuals. For that I executed this command:

./plink2 --bgen ukb_imp_chr22_v3.bgen --sample ukb11350_imp_chr22_v3_s487314.sample --keep IDs.txt --make-pgen --out new_c22

From which obtained this error 

--keep: 0 samples remaining.
Error: No samples remaining after main filters.

The file IDs.txt is a column with the participant IDs I want to extract from the original bgen file

Could you please help me with understanding what I'm doing wrong?


Thanks!
Thank you so much!!!

To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2...@googlegroups.com.

Andres Diaz-Pinto

unread,
Sep 23, 2019, 9:25:08 AM9/23/19
to plink2-users
Hi,


I just realised that the file IDs.txt must contain both family ID (FID) and Individual ID (IID)
Before it only contained IID.


Thanks!

Christopher Chang

unread,
Sep 23, 2019, 1:07:56 PM9/23/19
to plink2-users
Incidentally, if you have unique IIDs and want to get rid of the FIDs, you can run "plink2 --pfile <old prefix> --make-pgen psam-cols=-fid --out <new prefix>".  After you've done this, --keep on an ID-only file should work.

Monica Isgut

unread,
Sep 23, 2019, 2:10:20 PM9/23/19
to plink2-users
Hi Chris,

Another question on running GWAS using BGEN files. I am trying to bypass the need to convert to PGEN format if possible. When I run GWAS using PGEN and a phenotype file with 5 different phenotypes coded as 1 and 0, the following command works:

~/plink2 --pfile pgen_chr22 --glm --pheno labels_pheno.txt —-1 —out chr_22

However, when I do the same with --bgen and --sample instead of PGEN, with the following command:

~/plink2 --bgen UKB_chr22_AFTER_QC.bgen ref-first --sample ~/UKB_chr22_AFTER_QC.sample --glm --pheno labels_pheno.txt --out from_bgen_chr22 --1


I get this error:

Error: --data/--sample cannot be used with --1.


Any suggestions on what to do?

Thanks!

Christopher Chang

unread,
Sep 23, 2019, 3:36:32 PM9/23/19
to plink2-users
This is counterproductive.  plink2 automatically converts any other format to PGEN before ANY operation.

Andres Diaz-Pinto

unread,
Sep 23, 2019, 3:59:01 PM9/23/19
to plink2-users
Hi Chris,


Many thanks for taking the time to answer this.
I saw the IIDs are unique. I also saw that the FID is just a copy of the IID. I think I should filter the IIDs that have some family connection. Do you know which command I can use for that?
Thanks again for the answer,


Andres

Andres Diaz-Pinto

unread,
Sep 23, 2019, 4:00:43 PM9/23/19
to plink2-users
I suppose plink can also convert from pgen to bgen, right? which command is better for that?

Christopher Chang

unread,
Sep 23, 2019, 4:24:13 PM9/23/19
to plink2-users
1. I believe the UK Biobank provides a precomputed set of IDs which have been filtered for close relations.  However, if you want to customize that filter,
"plink2 --pfile <input fileset prefix> --king-cutoff <threshold> --make-pgen --out <output fileset prefix>"
should work.  Note that the threshold is in KING's (http://people.virginia.edu/~wc9c/KING/manual.html ) units, where 0.177 corresponds to removing first-degree relations (this is very, very accurate), 0.088 corresponds to removing second-degree as well (slightly less accurate), etc.

2. "plink2 --pfile <input fileset prefix> --export bgen-1.2 --out <output fileset prefix>"

Andres Diaz-Pinto

unread,
Sep 23, 2019, 9:39:50 PM9/23/19
to plink2-users
1. You are right. There is a file for Relatedness called ukbA_rel_sP.txt
2. Great!
3. I still have some doubts regarding the calls files (bed/bim/fam) in the ukbiobank. How could you describe that data? Am I accurate if I say the call files are the results of genotyping before to imputation or previous imputation?

Many thanks again Chris!

Andres Diaz-Pinto

unread,
Sep 29, 2019, 8:38:46 PM9/29/19
to plink2-users
Hi Chris,


Just a quick question regarding QC. How do you remove non Europeans in the UKBiobank genetic data using plink?

Thanks again!

Christopher Chang

unread,
Sep 30, 2019, 12:28:12 AM9/30/19
to plink2-users
You can use --keep on a file with just European sample IDs; or if you have a table with sample IDs in the first column and continent in another column, you can use --keep-fcol + --keep-fcol-name/--keep-fcol-num (--filter + --mfilter in plink 1.9) on that table.

Dheeraj Bobbili

unread,
Apr 14, 2020, 9:03:19 AM4/14/20
to plink2-users
Hi,

Why should  we convert the bgen files to pgen files instead of directly converting them to vcf files?. From what I understand pgen files are generated temporarily by default. Please correct me if I'm wrong.

Regards,
Dheeraj.

Christopher Chang

unread,
Apr 14, 2020, 10:50:28 AM4/14/20
to plink2-users
If you don't convert to .pgen, every plink2 operation you perform will waste time converting to .pgen.

Dheeraj Bobbili

unread,
Apr 14, 2020, 12:35:12 PM4/14/20
to plink2-users
Yes, that's what I thought. Thank you for clarifying it. Much appreciated.

Hossein Mohammadian

unread,
Jun 15, 2020, 1:22:09 AM6/15/20
to plink2-users
Hi Christopher,

I am trying to use use plink2 on UKBiobank genomics data to get vcf file. Based on other people's conversation here, I run
plink2 --threads 8 --bgen ukb_imp_chr22_v3.bgen ref-first --sample imp_chr17_v3_s487296.sample --extract SNP_List.txt --make-pgen --out RESULTS/exm22

and I get this:
Start time: Sun Jun 14 23:17:21 2020
258208 MiB RAM detected; reserving 129104 MiB for main workspace.
Allocated 7269 MiB successfully, after larger attempt(s) failed.
Using up to 8 compute threads.

--bgen: 1255683 variants detected, format v1.2.
487409 samples imported from .sample file to RESULTS/exm22-temporary.psam .
--bgen: 450k variants converted.Killed

Then I run this:
plink2 --pgen exm22-temporary.pgen --pvar exm22-temporary.pvar --psam exm22-temporary.psam --export vcf bgz vcf-dosage=DS

and I get this:

Start time: Sun Jun 14 23:34:01 2020
258208 MiB RAM detected; reserving 129104 MiB for main workspace.
Allocated 7269 MiB successfully, after larger attempt(s) failed.
Using up to 16 threads (change this with --threads).
487409 samples (264302 females, 222994 males, 113 ambiguous; 487409 founders)
loaded from exm22-temporary.psam.
Error: Line 1189421 of exm22-temporary.pvar has fewer tokens than expected.
End time: Sun Jun 14 23:35:00 2020


Could you please guide me what the problem is and how to solve it?

Thanks!

Jingqin Luo

unread,
Feb 19, 2021, 6:07:32 PM2/19/21
to plink2-users
I encountered the same issue as Hossein and wonder if there is a solution to this?

Christopher Chang

unread,
Feb 19, 2021, 6:14:35 PM2/19/21
to plink2-users
The "Killed" message is usually due to a memory limit imposed by your compute cluster, and should be solvable by specifying a lower plink2 workspace size with --memory.

(The second error was self-inflicted: if your previous import crashed in the middle, the incomplete result won't be usable.)

Jingqin Luo

unread,
Feb 19, 2021, 6:26:12 PM2/19/21
to plink2-users
Thanks. Chris, for your speedy reply. I suspected that. I will increase requesting more memory from the server to see if this issue is solved.

 I only want to extract some SNPs and my plink2 command reads:  

plink2 --bgen ukb_imp_chr1_v3.bgen 'ref-first' --sample  ukb_imp_chr1_v3_s487280.sample  --extract  snp.txt  --make-bed

and the snp.txt only has the rs ID of one SNP:   rs71658797

###############below is the run log #######

PLINK v2.00a2LM AVX2 Intel (10 Sep 2019)       www.cog-genomics.org/plink/2.0/

(C) 2005-2019 Shaun Purcell, Christopher Chang   GNU General Public License v3

Logging to plink2.log.

Options in effect:

  --bgen /storage1/fs1/uk-biobank/Active/UKB_Genetics/imputed/ukb_imp_chr1_v3.bgen ref-first

  --extract /storage1/fs1/yin.cao/Active/rosyluo/UKbiobank/Lung_PRS/Data/genotype/Chr1_LungPRS_19SNP.txt

  --make-bed

  --sample /storage1/fs1/yin.cao/Active/rosyluo/UKbiobank/ukb55288_imp_chr1_v3_s487280.sample


Start time: Fri Feb 19 23:02:00 2021

385353 MiB RAM detected; reserving 192676 MiB for main workspace.

Using up to 32 threads (change this with --threads).

--bgen: 7402791 variants detected, format v1.2.

487409 samples imported from .sample file to plink2-temporary.psam .

--bgen: 7402k variants scanned.Killed

###############
 Although run was not completed due to the memory issue, the three files were output :  plink2-temporary.pgen  plink2-temporary.psam  plink2-temporary.pvar

When checking into the .pvar file, I still get the 720k ish variants but not the one variant I want to extract. Seems like the --extract options is not passed in? Or these 3 temporary files will be produced no matter how many snps I want to extract? 

Christopher Chang

unread,
Feb 19, 2021, 6:27:40 PM2/19/21
to plink2-users
No, that's exactly what you should NOT do.  Just tell plink2 to use less memory.

Jingqin Luo

unread,
Feb 19, 2021, 7:26:33 PM2/19/21
to plink2-users
I thought more memory will speed up plink2. So, keeping the memory option in plink2 unchanged, I am requesting more memory from my computing server's to allow plink2 to run fast?

Christopher Chang

unread,
Feb 19, 2021, 7:28:53 PM2/19/21
to plink2-users
A little bit, but it's a nice-to-have, it's not really needed.  plink2 should be plenty fast if you stay within your compute cluster's memory quota.

Jingqin Luo

unread,
Feb 19, 2021, 7:55:30 PM2/19/21
to plink2-users
I think the default computer memory limit is 4GB. The plink2 run log says: 385353 MiB RAM detected; reserving 192676 MiB for main workspace.
So plink2 is using <2 GB. How much limit should I set plink2 --memory 2000  ?


Christopher Chang

unread,
Feb 19, 2021, 8:00:11 PM2/19/21
to plink2-users
No, plink2 is using 192 GiB, which is way, way more than it actually needs.  Because you didn't tell it about your account's memory quota.

You can try "--memory 4000" if your standard quota is 4 GiB.

Jingqin Luo

unread,
Feb 19, 2021, 10:05:11 PM2/19/21
to plink2-users
Thanks. Chris. I have got it to work with 4G memory.  Have a good weekend!
Reply all
Reply to author
Forward
0 new messages