Re: GWAS results show NA in effect size (beta & SE): Why and how to address this?

Chris Chang

unread,

Sep 14, 2023, 5:27:22 AM9/14/23

to Huy Nguyen, plink2-users

0. Always post .log files when asking for help here. In particular, it's clear that the command lines you've posted aren't exactly what plink2 saw, at least if we're talking about one of the standard plink2 builds, so I need to see what was actually executed.

1. You must use --pfile / --make-pgen instead of --bfile / --make-bed at all times if you want to preserve dosages from the .bgen.

On Thu, Sep 14, 2023 at 1:53 AM Huy Nguyen <huynguye...@gmail.com> wrote:

Dear All,

I'm preparing data for Mendelian randomization (MR) analysis to assess causal effect of telomere length on kidney phenotype in UK Biobank (UKB) data. The following steps were what I have done:

1. I started to search for prior research summary data and found close to 800 SNPs for telomere length.
2. I retained only 24 SNPs with P < 5*10^-8.
3. I extracted those SNPs from UKB imputed genetic data to make **files.vcf**, each of which contains SNP dosage in each chromosome (chr), for example, the code below is just an example for SNPs in chr1:
```
plink2 --bgen ukb22828_c1_b0_v3.bgen --sample ukb22828_c1_b0_v3_s487159.sample --threads 4 --out twas_snp_chr1 --extract TL_snplist_chr1.txt --bgen-annotate ‘ref-first’ --export vcf vcf-dosage=DS-force
```

4. I converted these files (files.vcf) into bfiles (file.fam, file.bim, file.bed). An example code for SNPs of chromosome 1 is below:
```
plink --vcf twas_snp_chr1.vcf --make-bed --out twas_snp_chr1
```

5. I merged those files to make 1 file.bed, 1 file.bim and 1 file.fam, instead of 22 files each.
```
plink --bfile twas_snp_chr1 --merge-list TL_allsnps_allchromosomes.txt --make-bed --out data_TL_4_KD
```

6. I estimated polygenic risk score (PRS) using prior research effect size (beta) from step 2 and genetic dosage files from step 5. This step served doing one-sample MR analysis.

7. To generate summary data for two-sample MR analysis, I performed GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (eGFR), using the folllowing code:
```
plink2 --bfile data/genotypes/data_TL_4_KD --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogeGFRcrea --covar data/Covariatesdata --covar-name PC{1..10}, Age, Sex --out output/GWAS_eGFRcrea.cvrt
```
However, the GWAS results showed NA for each size of about half of SNPs as shown below: [Attach1]
When I checked dosages in text.file, it also showed 0 dosage for many SNPs: [Attach2]

So, my questions are:
1. Why did the dosage of SNPs show 0? Is that because some SNPs actually have zero dosage or because it could be incorrect in one of the above steps?
2. If so, how to address this issue in order to generate effect size (beta and SE) for summary-level data for 2-sample MR analysis?
I searched this issue from various communities (Plink users, StackoverFlow, Bioconductor Community, etc), but I could not find a solution, therefore appreciating any of your help.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/fbd73197-b997-4480-9a62-eea1d8d749a4n%40googlegroups.com.

Huy Nguyen

unread,

Sep 14, 2023, 10:30:38 PM9/14/23

to plink2-users

Thanks Chris so much for your advice. As per your guide, I created files.log for all steps involved in doing GWAS via Plink:

Step 3. I extracted those SNPs from UKB imputed genetic data to make files.vcf, each of which contains SNP dosage in each chromosome (chr), for example, the code below is just an example for SNPs in chr1 (file: step3_twas_snp_chr2.log attached)

Step 4. I converted these files (files.vcf) into bfiles (file.fam, file.bim, file.bed) (file: step4_data_TL_4_KD.log attached)

Step 5. I merged those files to make 1 file.bed, 1 file.bim and 1 file.fam, instead of 22 files each.
File1: step5a_data_TL_4_KD_with error.log attached: This step got an error due to a SNP duplication.

File2: step5b_data_TL_4_KD_removeDuplicate_dueto_error.log attached: This step I addressed SNP duplication error.
File3: step5c_data_TL_4_KD_merged.log attached: This step I merged all files.

Step 7. To generate summary data for two-sample MR analysis, I performed GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (such as eGFR):

File1: step7a_GWAS_4_eGFRcrea.log attached

File2: step7b_results of GWAS attached which showed NA of effect size for 13 SNPs

However, the GWAS results showed estimated effect sizes (beta & SE) as NAs for 12 SNPs (File2: step7b_results of GWAS)

Given a great pressure from my leadership to complete this project MR, I'm really looking forward to receive any solution that can address this issue.

Kind regards,

Vào lúc 19:27:22 UTC+10 ngày Thứ Năm, 14 tháng 9, 2023, chrch...@gmail.com đã viết:

step5a_data_TL_4_KD_with error.log.log

step7b_results of GWAS.txt

step7a_GWAS_4_eGFRcrea.log

step5b_data_TL_4_KD_removeDuplicate_dueto_error.log.log

step4_data_TL_4_KD.log.log

step3_twas_snp_chr2.log.log

step5c_data_TL_4_KD_merged.log

Chris Chang

unread,

Sep 14, 2023, 10:48:37 PM9/14/23

to Huy Nguyen, plink2-users

- Please reread the second half of my previous comment.

- You can use --pmerge-list to concatenate .pgen filesets.

To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/6d4c4a30-1113-49e2-b109-2c05a3c01f41n%40googlegroups.com.

Huy Nguyen

unread,

Sep 15, 2023, 12:09:34 AM9/15/23

to plink2-users

Thank Chris for your continued advice. Could you please give me a step-by-step guide of how to use --pmerge? (I'm sorry I'm not familiar with --pmerge codes). You mean I need to merge these files before extracting 24 SNPs, is that right?

Perhaps, for practical, can I name steps I need to follow below to serve GWAS run, so please correct me if any of these is not correct:

Step 1: using --pmerge to merge all files.bgen (from chr1 to chr22)?

Yet, as in UKB I could not find data files.pgen, while I only find files.bgen and files.sample for imputed genetic data. So, can I still use --pmerge to merge all files.bgen? If so, can you give me any link or guide materials how to write --pmerge codes to make merge?

Step 2: extract 24 SNPs to make files.vcf?

Step 3: convert this file.vcf into bfiles (file.fam, file.bim, file.bed)?

Step 4: perform GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (eGFR) in order to estimate summary data for two-sample MR analysis?

I look forward to receiving your continued advice.

Vào lúc 12:48:37 UTC+10 ngày Thứ Sáu, 15 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,

Sep 16, 2023, 5:33:25 AM9/16/23

to plink2-users

Hi Chris,

I learned some codes across discussion in the group to make file.pgen; however, I got the following error. So, how could I solve this problem?

plink2 --bgen ukb22828_c7_b0_v3.bgen ref-first --sa mple ukb22828_c7_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 mi ssing --out c7 --set-all-var-ids @:#[ukb]\$r
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to c7.log.
Options in effect:
--bgen ukb22828_c7_b0_v3.bgen ref-first
--make-pgen
--new-id-max-allele-len 100 missing
--out c7
--sample ukb22828_c7_b0_v3_s487159.sample
--set-all-var-ids @:#[ukb]$r

Start time: Sat Sep 16 11:06:32 2023
515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--bgen: 5405524 variants detected, format v1.2.
487409 samples imported from .sample file to c7-temporary.psam .
--bgen: c7-temporary.pgen + c7-temporary.pvar written.
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from c7-temporary.psam.
Warning: 1 variant ID erased by --set-all-var- ids due to allele code length.
5405524 variants loaded from c7-temporary.pvar .
Note: No phenotype data present.
Writing c7.psam ... done.
Writing c7.pvar ... done.
Writing c7.pgen ... 47%Killed

I really look forward to receiving your help.

Vào lúc 14:09:34 UTC+10 ngày Thứ Sáu, 15 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,

Sep 16, 2023, 11:04:06 AM9/16/23

to plink2-users

"Killed" indicates that the program was killed by the Linux Out Of Memory manager. In the context of plink, this usually happens when you are running on a shared machine, and plink's default behavior of reserving ~50% of system memory is too greedy.

The --memory flag is the usual way to handle this. I also recommend updating your plink2 build; there has been an improvement in plink2's ability to detect how much memory is actually available on the system.

Huy Nguyen

unread,

Sep 17, 2023, 3:51:45 AM9/17/23

to plink2-users

Thanks Chris for useful advice. I successfully converted all files.bgen into files.pgen, and now I tried to merge them all in one file.pgen, I got an error below:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to CHRs.log.
Options in effect:
--make-pgen
--out CHRs
--pmerge-list List_22files_pgen_names.txt

Start time: Sun Sep 17 17:43:16 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).

--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
Error: Invalid variant count in .pgen file.

Could you please help, if I need to revise codes or what is the problem?

In List_22files_pgen_names.txt, I listed all pgen files with prefix only from c1-c22

Vào lúc 01:04:06 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 17, 2023, 5:11:15 AM9/17/23

to plink2-users

At least one of your .pgen files appears to be corrupted. Try running --validate on them, and then rerun the .bgen import command for any .pgen that failed validation.

I also recommend updating your plink2 build.

Huy Nguyen

unread,

Sep 17, 2023, 7:33:41 AM9/17/23

to plink2-users

Thanks Chris so much. I tried to update plink2 build (I downloaded alpha 4.8 for Windows 64-bit in the link: https://www.cog-genomics.org/plink/2.0/); however, when I ran validating, I still saw dated 24 Oct, 2022, similar to the prior build I used), so I'm not sure whether or how I got the correct one. I'm looking forward to receiving your continued advice.

for i in {2..22}; do
plink2 --pfile c${i} --validate
done

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to plink2.log.
Options in effect:
--pfile c2
--validate

Start time: Sun Sep 17 21:22:33 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)

loaded from c2.psam.
8129063 variants loaded from c2.pvar.
Validating c2.pgen...

Vào lúc 19:11:15 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 17, 2023, 8:16:09 AM9/17/23

to plink2-users

You're currently running a Linux build, so I'm guessing you're using a Windows computer to connect to a remote Linux server which plink2 is running on. In this case you need to download a new Linux build onto the remote Linux server. Running "wget https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20230915.zip", followed by unzipping the downloaded file, may work. Then, when running plink2, make sure to specify the directory you unzipped the new build to, or configure your PATH so that you don't have to.

Huy Nguyen

unread,

Sep 18, 2023, 8:47:03 PM9/18/23

to plink2-users

Thanks Chris for your valuable advice. I validated all files.pgen and found one of them as a corrupted file, so I downloaded it successfully again, and then I executed the --pmerge-list; yet I got an error below:

Start time: Tue Sep 19 10:40:18 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .

Error: The biallelic variants with ID '1:54712[ukb]T' at position 1:54712 in
c1.pvar appear to be the components of a 'split' multiallelic variant; if so,
it must be 'joined' (with e.g. "bcftools norm -m") before a correct merge can
occur. If you are SURE that your data does not contain any same-position
same-ID variant groups that should be joined, you can suppress this error with
--multiallelics-already-joined.
End time: Tue Sep 19 10:40:20 2023

Then, with a hope to address the error, I did the following code, but an error said the multiallelic-variant dosage support is under development; so is there any way that I can address this issue?

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs --multiallelics-already-joined

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to CHRs.log.
Options in effect:
--make-pgen

--multiallelics-already-joined
--out CHRs
--pmerge-list List_22files_pgen_names.txt

Start time: Tue Sep 19 10:44:10 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .

--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 0/92735688 variants complete.Error: --pmerge[-list] multiallelic-variant dosage support is under development.

I still look forward to receiving your guide on this matter.

Vào lúc 22:16:09 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 19, 2023, 4:46:07 AM9/19/23

to plink2-users

If you had installed a newer plink2 build, you would have received the following more-useful error message instead:

"...appear to be the components of a 'split' multiallelic variant; if so, it must be 'joined' (with e.g. "bcftools norm -m") before a correct merge can occur. If you are SURE that your data does not contain any same-position same-ID variant groups that should be joined, you can suppress this error with --multiallelics-already-joined. Alternatively, you can keep the variants separate by first assigning unique IDs with e.g. --set-all-var-ids."

Huy Nguyen

unread,

Sep 20, 2023, 3:05:44 AM9/20/23

to plink2-users

Thanks Chris so much.

I tried one option "bcftools norm -m") as per your advice, but I could not find codes for this; also all of my files are files.pgen (not files.vcf), so not sure how to join these files.pgen for addressing the issue.

Therefore, I resorted to removing all duplications by both methods below for each file.pgen, since using one of them did not work:

plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r #way1

plink2 --pfile c1 --rm-dup force-first --make-pgen --out c1 #way2

And then I used the following code with another error:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022) www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to CHRs.log.
Options in effect:
--make-pgen
--out CHRs
--pmerge-list List_22files_pgen_names.txt

Start time: Wed Sep 20 16:56:12 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.

Error: Non-concatenating --pmerge-list is under development.

So, I appreciate your continued support.

Vào lúc 18:46:07 UTC+10 ngày Thứ Ba, 19 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 20, 2023, 5:42:56 AM9/20/23

to plink2-users

0. Why are you not including $a in the --set-all-var-ids template string? Do you intentionally want to keep only one part of each split multiallelic variant?

1. The error message implies that you mismanaged your files in a way that makes e.g. one chromosome appear twice. You can take a quick look at your chromosome codes with e.g. "tail <.pvar filename>" on each .pvar.

Huy Nguyen

unread,

Sep 20, 2023, 6:35:17 AM9/20/23

to plink2-users

Thanks Chris.

0. Why are you not including $a in the --set-all-var-ids template string? Do you intentionally want to keep only one part of each split multiallelic variant?

As I plan to extract SNP dosages for MR (Mendelian randomization) analysis, I thought it would be fine to keep only part of it (I'm not sure given my limited understanding in the field). Do you think by doing so can affect downstream analysis for MR studies?

Also, could you please advise me on the following code I revised as per your suggestion including $a?

plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]$a

1. The error message implies that you mismanaged your files in a way that makes e.g. one chromosome appear twice. You can take a quick look at your chromosome codes with e.g. "tail <.pvar filename>" on each .pvar.

Yes, I did mismatch one file, so I will do it again, very appreciating your sharp diagnosis.

Vào lúc 19:42:56 UTC+10 ngày Thứ Tư, 20 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,

Sep 21, 2023, 7:17:46 PM9/21/23

to plink2-users

Thanks Chris so much. With your useful advice, I was able to merge all files.pgen into one file.pgen successfully using the following code:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

Then I used the following code to estimate polygenic risk score (PRS), but a new error happened below. However, if I converted file.pgen into file.fam, .bim, .bed to serve PRS estimation, I got two problems: 1 is not enough space left on remote server, and 2 is you advised we may not preserve dosage if working on bfile.

Rscript software/PRSice/PRSice.R --prsice software/PRSice/PRSice_linux --dir /home/user --base data/sumstats/TL_Summarydata.txt --target CHRs --out output/PRS_TL --no-regress --lower 5.00E-08 #
PRSice 2.3.5 (2021-09-20)
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly

GNU General Public License v3

If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2023-09-22 09:04:35
./software/PRSice/PRSice_linux \
--a1 A1 \
--a2 A2 \
--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
--base data/sumstats/TL_Summarydata.txt \
--binary-target F \
--bp BP \
--chr CHR \
--clump-kb 250kb \
--clump-p 1.000000 \
--clump-r2 0.100000 \
--interval 5e-05 \
--lower 5e-08 \
--no-regress \
--num-auto 22 \
--out output/PRS_TL \
--pvalue P \
--seed 1633606172 \
--snp SNP \
--stat BETA \
--target CHRs \
--thread 1 \
--upper 0.5

Initializing Genotype file: CHRs (bed)

Start processing TL_Summarydata
==================================================

Base file: data/sumstats/TL_Summarydata.txt
Header of file is:
SNP CHR BP A1 A2 BETA P

Reading 100.00%
24 variant(s) observed in base file, with:
5 ambiguous variant(s) excluded
19 total variant(s) included from base file

Loading Genotype info from target
==================================================

Error: Cannot open file: CHRs.fam

Error:
Execution halted

So, my questions seeking your help are:

1. What should I do to tackle the above issue?

2. I have 24 variants, but why 5 variants were excluded, what should I do to keep these 5 variants since they are also important to my analysis (PRS estimation and performing MR analysis)?

I'm still looking forward to receiving your continued support.

Vào lúc 20:35:17 UTC+10 ngày Thứ Tư, 20 tháng 9, 2023, Huy Nguyen đã viết:

Huy Nguyen

unread,

Sep 26, 2023, 4:14:53 AM9/26/23

to plink2-users

Dear All,

I tried and retried many times to make multiple files.pgen (each for one chromosome) from files.bgen using the following codes which I applied from some of our plink2-users group discussions:

plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r,\$a

And then validated using the following code:

plink2 --pfile c1 --validate #after all, all files (c1, c2, c3, ..., c22 were all good, without any corrupted problem)

And then, merged these files.pgen using the following codes:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

But I got the following error:

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)

Options in effect:
--make-pgen
--out CHRs
--pmerge-list List_22files_pgen_names.txt

Hostname: hitchpc
Working directory: /mnt/wd12TB
Start time: Mon Sep 25 06:27:46 2023

Random number seed: 1695587266
515522 MiB RAM detected, ~509216 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.

Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)

loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Tue Sep 26 05:36:47 2023

So, I appreciate if anyone could advise why this error persisted and how can I address this issue? Is this because I used a comma in the code making files.pgen? (I mean should I use a comma or not before \$a at the end of the code? or which of the following codes should I use in order to avoid this above error at later stage:

Option 1:

plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r,\$a

Option 2:

plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r\$a

Thank you and really looking forward to receiving your help.

Vào lúc 09:17:46 UTC+10 ngày Thứ Sáu, 22 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,

Sep 26, 2023, 6:46:33 AM9/26/23

to plink2-users

1. Please post the output of "head -n 2 CHRs-merge.pvar".

2. Do you still see this error if you only try to merge chromosomes 21 and 22?

Huy Nguyen

unread,

Sep 26, 2023, 9:54:00 AM9/26/23

to plink2-users

Thanks Christopher so much for your continued support. Here is the output done as per your advice:

1. Please post the output of "head -n 2 CHRs-merge.pvar".

head -n 2 CHRs-merge.pvar
#CHROM POS ID REF ALT
1 10177 1:10177[ukb]A,AC A AC

2. Do you still see this error if you only try to merge chromosomes 21 and 22?

After I tried to merge only two files, I think it is successful now. So, why was this, and how I address when merging 22 files?

plink2 --pmerge-list List_2files21_22_pgen_names.txt --make-pgen --out CHRs21_22

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to CHRs21_22.log.
Options in effect:
--make-pgen
--out CHRs21_22
--pmerge-list List_2files21_22_pgen_names.txt

Start time: Tue Sep 26 21:28:20 2023
515522 MiB RAM detected, ~509543 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).

--pmerge-list: 2 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs21_22-merge.psam .
--pmerge-list: 2 .pvar files scanned.
Concatenation job detected.
Concatenating... 2516841/2516841 variants complete.
Results written to CHRs21_22-merge.pgen + CHRs21_22-merge.pvar .

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)

loaded from CHRs21_22-merge.psam.
2516841 variants loaded from CHRs21_22-merge.pvar.

Note: No phenotype data present.

Writing CHRs21_22.psam ... done.
Writing CHRs21_22.pvar ... done.
Writing CHRs21_22.pgen ... done.
End time: Tue Sep 26 23:49:05 2023

I'm looking forward to receiving your next guide.

Vào lúc 20:46:33 UTC+10 ngày Thứ Ba, 26 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 26, 2023, 12:09:16 PM9/26/23

to plink2-users

Ok, the first line of CHRs-merge.pvar looks fine, so the error message is strange. If you just try to do whatever you wanted to do next with CHRS-merge.{pgen,pvar,psam} as input, what happens?

(In the meantime, there isn't a good reason to have "ukb" in your variant IDs. "b37" / "b38" might make sense because the same variant will have a different position depending on the reference genome build. But the UK Biobank data releases don't use different variant coordinates than the rest of the world.)

Huy Nguyen

unread,

Sep 28, 2023, 3:42:23 PM9/28/23

to plink2-users

Thanks Christopher. Here are the issues:

When I tried to merge those files.pgen for the fourth time, the error persisted below:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)
Options in effect:
--make-pgen
--out CHRs
--pmerge-list List_22files_pgen_names.txt

Start time: Wed Sep 27 12:22:56 2023
Random number seed: 1695781376
515522 MiB RAM detected, ~508924 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.

End time: Fri Sep 29 00:50:53 2023

When I continued to do GWAS, the same error happened:

plink2 --pfile CHRs-merge --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogBUN_mg_dl --covar data/Covariatesdata.txt --covar-name PC{1..10}, Age, Tuoi, Sex --extract TL_snplist_All.txt --out output/GWAS_BUN.cvrt

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to output/GWAS_BUN.cvrt.log.
Options in effect:
--covar data/Covariatesdata.txt
--covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, Age, Tuoi, Sex
--extract TL_snplist_All.txt
--glm hide-covar
--out output/GWAS_BUN.cvrt
--pfile CHRs-merge
--pheno data/Pheno_KFs.txt
--pheno-name LogBUN_mg_dl

Start time: Fri Sep 29 05:40:12 2023
515522 MiB RAM detected, ~508944 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.

End time: Fri Sep 29 05:40:12 2023

So, I lost my direction where to go and what should I do next?

Any of your next guide will be very appreciated.

Vào lúc 02:09:16 UTC+10 ngày Thứ Tư, 27 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,

Sep 28, 2023, 3:46:59 PM9/28/23

to plink2-users

Thanks Christopher. Here are the issues:

When I tried to merge those files.pgen for the fourth time, the error persisted below:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)
Options in effect:
--make-pgen
--out CHRs
--pmerge-list List_22files_pgen_names.txt
Start time: Wed Sep 27 12:22:56 2023
Random number seed: 1695781376
515522 MiB RAM detected, ~508924 available; reserving 257761 MiB for main
workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 00:50:53 2023

When I checked the first row of the output of "head -n 2 CHRs-merge.pvar".

head -n 2 CHRs-merge.pvar
[ukb]A,C A C
17 13837789 17:13837789[ukb]T,C T C

When I continued to do GWAS, the same error happened:

plink2 --pfile CHRs-merge --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogBUN_mg_dl --covar data/Covariatesdata.txt --covar-name PC{1..10}, Age, Tuoi, Sex --extract TL_snplist_All.txt --out output/GWAS_BUN.cvrt
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to output/GWAS_BUN.cvrt.log.
Options in effect:
--covar data/Covariatesdata.txt
--covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, Age, Tuoi, Sex
--extract TL_snplist_All.txt
--glm hide-covar
--out output/GWAS_BUN.cvrt
--pfile CHRs-merge
--pheno data/Pheno_KFs.txt
--pheno-name LogBUN_mg_dl

Start time: Fri Sep 29 05:40:12 2023
515522 MiB RAM detected, ~508944 available; reserving 257761 MiB for main
workspace.
Using up to 36 threads (change this with --threads).
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:40:12 2023

So, I lost my direction where to go and what should I do next?

Any of your next guide will be very appreciated.

Vào lúc 05:42:23 UTC+10 ngày Thứ Sáu, 29 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,

Sep 28, 2023, 3:47:26 PM9/28/23

to plink2-users

1. What is the .log output of "plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups"?

2. Assuming you see an error there, if you then run "head CHRs-merge.pvar > CHRs-merge-head.pvar" followed by "plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups", does that work?

3. If you still see an error there, can you post CHRs-merge-head.pvar (do NOT copy-and-paste, I need the binary file in this case)?

Huy Nguyen

unread,

Sep 28, 2023, 4:02:05 PM9/28/23

to plink2-users

Thanks Christopher.

1. What is the .log output of "plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups"?

plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to plink2.log.
Options in effect:
--pvar CHRs-merge.pvar
--write-snplist allow-dups
Start time: Fri Sep 29 05:49:35 2023
515522 MiB RAM detected, ~508950 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).

Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.

End time: Fri Sep 29 05:49:35 2023

2. Assuming you see an error there, if you then run "head CHRs-merge.pvar > CHRs-merge-head.pvar" followed by "plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups", does that work?

head CHRs-merge.pvar > CHRs-merge-head.pvar
and followed by:

plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023) www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3

Logging to plink2.log.
Options in effect:
--pvar CHRs-merge-head.pvar
--write-snplist allow-dups
Start time: Fri Sep 29 05:52:43 2023
515522 MiB RAM detected, ~508934 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).

Error: Line 1 of CHRs-merge-head.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:52:43 2023

3. If you still see an error there, can you post CHRs-merge-head.pvar (do NOT copy-and-paste, I need the binary file in this case)?

Given my project rules, is there any way that I can post this file to you only? or could you set up the way that I can send the file to you only?

Vào lúc 05:47:26 UTC+10 ngày Thứ Sáu, 29 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,

Sep 28, 2023, 4:03:31 PM9/28/23

to plink2-users

My email address is on the plink2 website.

Reply all

Reply to author

Forward