Convert bgen file to plink file for X-chromosome

246 views
Skip to first unread message

Liaoyi Xu

unread,
Jul 31, 2024, 12:07:47 AM7/31/24
to plink2-users
Hello,

I want to convert .bgen data to plink binary file for imputed SNPs in X chromosome, and use the converted plink binary file to run GWAS.

I did a lot of searching on Google, and the answer I found is that, unlike autosomal chromosomes, I need to add --split-par hg19 to deal with the X chromosome situation.

I first use following code to convert bgen to plink file:

mkdir -p ./sub_maf0.001_biallel_bbf

time ../liaoyi_UKB/EXECUTABLES/plink2 \
   --bgen ./sub_chrom_maf0.001_bgen_samp/sub_chromX_maf0.001.bgen ref-last \
   --sample ./sub_chrom_maf0.001_bgen_samp/sub_chromX_maf0.001.sample \
   --keep ./geno_qc_eids_400k_white_british_2023-11-29.txt \
   --maf 0.001 \
   --rm-dup 'exclude-all' \
   --snps-only 'just-acgt' \
   --max-alleles 2 \
   --make-bed \
   --split-par hg19 \
   --out ./sub_maf0.001_biallel_bbf/biallel_sub_chrom${i}_maf0.001;

But I got a warning messgae: Warning: --split-par had no effect (no X variants were in the PARs).

Next, I use the generated plink file to run GWAS but got another warning message: Warning: Skipping --glm regression on phenotype 'pelvic_height', and other(s)
with identical missingness patterns, since covariate correlation matrix could
not be inverted (VIF_INFINITE). You may want to remove redundant covariates and
try again.

I think I didn't convert bgen to plink file correctly. Could you tell me where I did wrong?

Thank you!
Louis


Christopher Chang

unread,
Jul 31, 2024, 12:17:40 AM7/31/24
to plink2-users
Please post full .log file(s) when asking for troubleshooting help.

Liaoyi Xu

unread,
Jul 31, 2024, 12:22:31 AM7/31/24
to plink2-users
Sorry for missing the .log files. I’ve now attached both log files for converting BGEN to PLINK and for the GWAS.

Thank you!
Louis

convert-biallel_sub_chrom_maf0.001.log
gwas-hip_both_x.log

Christopher Chang

unread,
Jul 31, 2024, 12:31:01 AM7/31/24
to plink2-users
- You probably want to use --make-pgen / --pfile rather than --make-bed / --bfile when starting from .bgen data.  Otherwise all the dosages are converted to integers or missing values.


chrX is special in two ways:

  • First, sex (as defined in the .fam/.psam input file) is normally included as an additional covariate. If you don't want this, add the 'no-x-sex' modifier. Or you can add the 'sex' modifier to include .fam/.psam sex as a covariate everywhere. Whatever you do, don't include sex from the .fam/.psam file and the --covar file at the same time; otherwise the duplicated column will cause the regression to fail.
That's my first guess as to why your regression failed.
In the meantime, there have been a few --glm bugfixes since 2020, so I recommend running with a newer plink2 build.

Liaoyi Xu

unread,
Jul 31, 2024, 4:59:48 PM7/31/24
to plink2-users
Thank you so much for your help!

 I now tried to use --make-pegen, but I still got the same warning message: Warning: --split-par had no effect (no X variants were in the PARs) I want to check with you if I did it correct, and I attached the log file below. 

Then I run GWAS with this generated pgen file and added the 'no-x-sex' modifier, however, when I checked the GWAS output file it has two column I never seen before: PROVISIONAL_REF? and ERRCODE, may I ask if I run GWAS correctly?

I also tried to do --clump with plink2, however I found the --clump-range in plink 1.9 is not avaliable in plink 2, is there any substitute? Since --pfile not works in plink 1.9 so I have to use plink 2.

Best,
Liaoyi

hip_both_x.log
biallel_sub_chrom_maf0.001.log

Christopher Chang

unread,
Aug 4, 2024, 8:10:07 PM8/4/24
to plink2-users
- When --split-par has no effect, that implies the pseudoautosomal region was filtered out or split off earlier.
- --clump-range is not implemented yet in plink 2; sorry about the inconvenience.  I will look into adding something similar (probably just additional columns in the .clumps file, rather than a separate .ranges file) in the next month or so.  Until then, with e.g. --set-all-var-ids, you can set up your --clump operation so that it's easy to determine a clump's range from the contents of the SP2 column.

Liaoyi Xu

unread,
Aug 15, 2024, 5:57:21 PM8/15/24
to plink2-users
Hi Chris,

Thanks for your help! I have a follow-up question. When I ran a GWAS on the X chromosome, I noticed that a large portion of the alternative (ALT) alleles don’t match the A1 allele. However, for the autosomes, almost all of the ALT alleles matched with A1. May I ask why this happens?

Best,
Liaoyi

Christopher Chang

unread,
Aug 16, 2024, 2:05:19 AM8/16/24
to plink2-users
Haven't heard of this before.  Can't guess at an explanation without at least e.g. a 100-variant chrX example dataset exhibiting what you're talking about.

Liaoyi Xu

unread,
Aug 16, 2024, 1:59:31 PM8/16/24
to plink2-users
Hi Chris,

Here is the link to the x chromosome .pgen file and the gwas sumstats for 200 variants: https://drive.google.com/drive/folders/1zxbH0UKbFrV4bz7PnNmCZevd7qXfN6NE?usp=sharing

Below is how I generate .pgen file and run gwas:

1. I first downloaded the imputed genotype data from UKB (fid22828):
ukb22828_cX_b0_v3.bgen
ukb22828_cX_b0_v3.sample

2. I use the following code to convert bgen file:

mkdir -p ./sub_chrom_maf0.001_bgen_samp
time ../liaoyi_UKB/EXECUTABLES/plink2 \
   --bgen ./ukb22828_cX_b0_v3.bgen ref-last \
   --sample ./ukb22828_cX_b0_v3.sample \
   --maf 0.001 \
   --keep ./geno_qc_eids_400k_white_british_2023-11-29.txt \
   --export bgen-1.3 \
   --split-par hg19 \
   --out ./sub_chrom_maf0.001_bgen_samp/sub_chromX_maf0.001;


3. I use the following code to convert bgen file to pgen file:

mkdir -p ./sub_maf0.001_biallel_pgen
time ../liaoyi_UKB/EXECUTABLES/plink2 \
   --bgen ./sub_chrom_maf0.001_bgen_samp/sub_chromX_maf0.001.bgen ref-last \
   --sample ./sub_chrom_maf0.001_bgen_samp/sub_chromX_maf0.001.sample \
   --keep ./geno_qc_eids_400k_white_british_2023-11-29.txt \
   --maf 0.001 \
   --rm-dup 'exclude-all' \
   --snps-only 'just-acgt' \
   --max-alleles 2 \

   --make-pgen \
   --split-par hg19 \
   --out ./sub_maf0.001_biallel_pgen/biallel_sub_chrom${i}_maf0.001;


4. I use the following code to run gwas:

mkdir -p ../../PLINK_LM_OUTPUT/20240728/
time ../../EXECUTABLES/plink2 \
   --pfile ../../../genotype/sub_maf0.001_biallel_pgen/biallel_sub_chrom_maf0.001 \
   --maf 0.01 \
   --mind 0.025 \
   --geno 0.02 \
   --glm hide-covar no-x-sex\
   --pheno ../../GWAS_INPUT_DATA/hip_select_pheno_gwas_both_cm_20240104.txt \
   --covar ../../GWAS_INPUT_DATA/plink_covar_40k_imaging_20231124.txt \
   --covar-variance-standardize \
   --ci 0.95 \
   --out ../../PLINK_LM_OUTPUT/20240728/hip_both_x



Thank you so much for your continued help!

Best,
Liaoyi

Christopher Chang

unread,
Aug 18, 2024, 1:52:02 PM8/18/24
to plink2-users
UK Biobank bgen files are ref-first, not ref-last.  The error message tells you this if you run --bgen without specifying ref-first/ref-last.
Reply all
Reply to author
Forward
0 new messages