Re: GWAS results show NA in effect size (beta & SE): Why and how to address this?

299 views
Skip to first unread message

Chris Chang

unread,
Sep 14, 2023, 5:27:22 AM9/14/23
to Huy Nguyen, plink2-users
0. Always post .log files when asking for help here.  In particular, it's clear that the command lines you've posted aren't exactly what plink2 saw, at least if we're talking about one of the standard plink2 builds, so I need to see what was actually executed.

1. You must use --pfile / --make-pgen instead of --bfile / --make-bed at all times if you want to preserve dosages from the .bgen.

On Thu, Sep 14, 2023 at 1:53 AM Huy Nguyen <huynguye...@gmail.com> wrote:
Dear All,

I'm preparing data for Mendelian randomization (MR) analysis to assess causal effect of telomere length on kidney phenotype in UK Biobank (UKB) data. The following steps were what I have done:

 1. I started to search for prior research summary data and found close to 800 SNPs for telomere length.
 2. I retained only 24 SNPs with P < 5*10^-8.
 3. I extracted those SNPs from UKB imputed genetic data to make **files.vcf**, each of which contains SNP dosage in each chromosome (chr), for example, the code below is just an example for SNPs in chr1:  
```
plink2 --bgen ukb22828_c1_b0_v3.bgen --sample ukb22828_c1_b0_v3_s487159.sample --threads 4 --out twas_snp_chr1 --extract TL_snplist_chr1.txt --bgen-annotate ‘ref-first’ --export vcf vcf-dosage=DS-force
```
 
4. I converted these files (files.vcf) into bfiles (file.fam, file.bim, file.bed). An example code for SNPs of chromosome 1 is below:
```
plink --vcf twas_snp_chr1.vcf --make-bed --out twas_snp_chr1
```
 
5. I merged those files to make 1 file.bed, 1 file.bim and 1 file.fam, instead of 22 files each.
```
plink --bfile twas_snp_chr1 --merge-list TL_allsnps_allchromosomes.txt --make-bed --out data_TL_4_KD
```
 
6. I estimated polygenic risk score (PRS) using prior research effect size (beta) from step 2 and genetic dosage files from step 5. This step served doing one-sample MR analysis.
 
7. To generate summary data for two-sample MR analysis, I performed GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (eGFR), using the folllowing code:
```
plink2 --bfile data/genotypes/data_TL_4_KD --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogeGFRcrea --covar data/Covariatesdata --covar-name PC{1..10}, Age, Sex --out output/GWAS_eGFRcrea.cvrt
```
However, the GWAS results showed NA for each size of about half of SNPs as shown below: [Attach1]
When I checked dosages in text.file, it also showed 0 dosage for many SNPs: [Attach2]
 
So, my questions are:
1. Why did the dosage of SNPs show 0? Is that because some SNPs actually have zero dosage or because it could be incorrect in one of the above steps?
2. If so, how to address this issue in order to generate effect size (beta and SE) for summary-level data for 2-sample MR analysis?
I searched this issue from various communities (Plink users, StackoverFlow, Bioconductor Community, etc), but I could not find a solution, therefore appreciating any of your help.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/plink2-users/fbd73197-b997-4480-9a62-eea1d8d749a4n%40googlegroups.com.

Huy Nguyen

unread,
Sep 14, 2023, 10:30:38 PM9/14/23
to plink2-users
Thanks Chris so much for your advice. As per your guide, I created files.log for all steps involved in doing GWAS via Plink:

Step 3. I extracted those SNPs from UKB imputed genetic data to make files.vcf, each of which contains SNP dosage in each chromosome (chr), for example, the code below is just an example for SNPs in chr1 (file: step3_twas_snp_chr2.log attached)
 
Step 4. I converted these files (files.vcf) into bfiles (file.fam, file.bim, file.bed) (file: step4_data_TL_4_KD.log attached)
 
Step 5. I merged those files to make 1 file.bed, 1 file.bim and 1 file.fam, instead of 22 files each.
File1: step5a_data_TL_4_KD_with error.log attached: This step got an error due to a SNP duplication.
File2: step5b_data_TL_4_KD_removeDuplicate_dueto_error.log attached: This step I addressed SNP duplication error. 
File3: step5c_data_TL_4_KD_merged.log attached: This step I merged all files.  
  
Step 7. To generate summary data for two-sample MR analysis, I performed GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (such as eGFR):
File1: step7a_GWAS_4_eGFRcrea.log attached
File2: step7b_results of GWAS attached which showed NA of effect size for 13 SNPs

However, the GWAS results showed estimated effect sizes (beta & SE) as NAs for 12 SNPs (File2: step7b_results of GWAS)
 
Given a great pressure from my leadership to complete this project MR, I'm really looking forward to receive any solution that can address this issue.

Kind regards,

Vào lúc 19:27:22 UTC+10 ngày Thứ Năm, 14 tháng 9, 2023, chrch...@gmail.com đã viết:
step5a_data_TL_4_KD_with error.log.log
step7b_results of GWAS.txt
step7a_GWAS_4_eGFRcrea.log
step5b_data_TL_4_KD_removeDuplicate_dueto_error.log.log
step4_data_TL_4_KD.log.log
step3_twas_snp_chr2.log.log
step5c_data_TL_4_KD_merged.log

Chris Chang

unread,
Sep 14, 2023, 10:48:37 PM9/14/23
to Huy Nguyen, plink2-users
- Please reread the second half of my previous comment.
- You can use --pmerge-list to concatenate .pgen filesets.

Huy Nguyen

unread,
Sep 15, 2023, 12:09:34 AM9/15/23
to plink2-users
Thank Chris for your continued advice. Could you please give me a step-by-step guide of how to use --pmerge? (I'm sorry I'm not familiar with --pmerge codes). You mean I need to merge these files before extracting 24 SNPs, is that right

Perhaps, for practical, can I name steps I need to follow below to serve GWAS run, so please correct me if any of these is not correct:

Step 1: using --pmerge to merge all files.bgen (from chr1 to chr22)? 
Yet, as in UKB I could not find data files.pgen, while I only find files.bgen and files.sample for imputed genetic data. So, can I still use --pmerge to merge all files.bgen? If so, can you give me any link or guide materials how to write --pmerge codes to make merge?

Step 2: extract 24 SNPs to make files.vcf?

Step 3:  convert this file.vcf into bfiles (file.fam, file.bim, file.bed)?

Step 4 perform GWAS to estimate beta and SE of association between each of 24 SNPs and Kidney phenotype (eGFR) in order to estimate summary data for two-sample MR analysis?

I look forward to receiving your continued advice.





Vào lúc 12:48:37 UTC+10 ngày Thứ Sáu, 15 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,
Sep 16, 2023, 5:33:25 AM9/16/23
to plink2-users
Hi Chris,

I learned some codes across discussion in the group to make file.pgen; however, I got the following error. So, how could I solve this problem?

plink2 --bgen ukb22828_c7_b0_v3.bgen ref-first --sa                                                                             mple ukb22828_c7_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 mi                                                                             ssing --out c7 --set-all-var-ids @:#[ukb]\$r
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to c7.log.
Options in effect:
  --bgen ukb22828_c7_b0_v3.bgen ref-first
  --make-pgen
  --new-id-max-allele-len 100 missing
  --out c7
  --sample ukb22828_c7_b0_v3_s487159.sample
  --set-all-var-ids @:#[ukb]$r

Start time: Sat Sep 16 11:06:32 2023
515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--bgen: 5405524 variants detected, format v1.2.
487409 samples imported from .sample file to c7-temporary.psam .
--bgen: c7-temporary.pgen + c7-temporary.pvar                                                                                                                written.
487409 samples (264221 females, 222938 males,                                                                                                                250 ambiguous; 487409 founders)
loaded from c7-temporary.psam.
Warning: 1 variant ID erased by --set-all-var-                                                                                                               ids due to allele code length.
5405524 variants loaded from c7-temporary.pvar                                                                                                               .
Note: No phenotype data present.
Writing c7.psam ... done.
Writing c7.pvar ... done.
Writing c7.pgen ... 47%Killed

I really look forward to receiving your help.

Vào lúc 14:09:34 UTC+10 ngày Thứ Sáu, 15 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,
Sep 16, 2023, 11:04:06 AM9/16/23
to plink2-users
"Killed" indicates that the program was killed by the Linux Out Of Memory manager.  In the context of plink, this usually happens when you are running on a shared machine, and plink's default behavior of reserving ~50% of system memory is too greedy.

The --memory flag is the usual way to handle this.  I also recommend updating your plink2 build; there has been an improvement in plink2's ability to detect how much memory is actually available on the system.

Huy Nguyen

unread,
Sep 17, 2023, 3:51:45 AM9/17/23
to plink2-users
Thanks Chris for useful advice. I successfully converted all files.bgen into files.pgen, and now I tried to merge them all in one file.pgen, I got an error below:

plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CHRs.log.
Options in effect:
  --make-pgen
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt

Start time: Sun Sep 17 17:43:16 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
Error: Invalid variant count in .pgen file.


Could you please help, if I need to revise codes or what is the problem? 
In List_22files_pgen_names.txt, I listed all pgen files with prefix only from c1-c22

Vào lúc 01:04:06 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 17, 2023, 5:11:15 AM9/17/23
to plink2-users
At least one of your .pgen files appears to be corrupted.  Try running --validate on them, and then rerun the .bgen import command for any .pgen that failed validation.

I also recommend updating your plink2 build.

Huy Nguyen

unread,
Sep 17, 2023, 7:33:41 AM9/17/23
to plink2-users
Thanks Chris so much. I tried to update plink2 build (I downloaded alpha 4.8 for Windows 64-bit in the link: https://www.cog-genomics.org/plink/2.0/); however, when I ran validating, I still saw dated 24 Oct, 2022, similar to the prior build I used), so I'm not sure whether or how I got the correct one. I'm looking forward to receiving your continued advice.
 
for i in {2..22};  do
  plink2 --pfile c${i} --validate
done

PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --pfile c2
  --validate

Start time: Sun Sep 17 21:22:33 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from c2.psam.
8129063 variants loaded from c2.pvar.
Validating c2.pgen...





Vào lúc 19:11:15 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 17, 2023, 8:16:09 AM9/17/23
to plink2-users
You're currently running a Linux build, so I'm guessing you're using a Windows computer to connect to a remote Linux server which plink2 is running on.  In this case you need to download a new Linux build onto the remote Linux server.  Running "wget https://s3.amazonaws.com/plink2-assets/plink2_linux_x86_64_20230915.zip", followed by unzipping the downloaded file, may work.  Then, when running plink2, make sure to specify the directory you unzipped the new build to, or configure your PATH so that you don't have to.

Huy Nguyen

unread,
Sep 18, 2023, 8:47:03 PM9/18/23
to plink2-users
Thanks Chris for your valuable advice. I validated all files.pgen and found one of them as a corrupted file, so I downloaded it successfully again, and then I executed the --pmerge-list; yet I got an error below:

Start time: Tue Sep 19 10:40:18 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
Error: The biallelic variants with ID '1:54712[ukb]T' at position 1:54712 in
c1.pvar appear to be the components of a 'split' multiallelic variant; if so,
it must be 'joined' (with e.g. "bcftools norm -m") before a correct merge can
occur. If you are SURE that your data does not contain any same-position
same-ID variant groups that should be joined, you can suppress this error with
--multiallelics-already-joined.
End time: Tue Sep 19 10:40:20 2023


Then, with a hope to address the error, I did the following code, but an error said the multiallelic-variant dosage support is under development; so is there any way that I can address this issue?
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs --multiallelics-already-joined
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CHRs.log.
Options in effect:
  --make-pgen
  --multiallelics-already-joined
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt

Start time: Tue Sep 19 10:44:10 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 0/92735688 variants complete.Error: --pmerge[-list] multiallelic-variant dosage support is under development.


I still look forward to receiving your guide on this matter.
Vào lúc 22:16:09 UTC+10 ngày Chủ Nhật, 17 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 19, 2023, 4:46:07 AM9/19/23
to plink2-users
If you had installed a newer plink2 build, you would have received the following more-useful error message instead:

"...appear to be the components of a 'split' multiallelic variant; if so, it must be 'joined' (with e.g. "bcftools norm -m") before a correct merge can occur. If you are SURE that your data does not contain any same-position same-ID variant groups that should be joined, you can suppress this error with --multiallelics-already-joined. Alternatively, you can keep the variants separate by first assigning unique IDs with e.g. --set-all-var-ids."

Huy Nguyen

unread,
Sep 20, 2023, 3:05:44 AM9/20/23
to plink2-users
Thanks Chris so much.
I tried one option "bcftools norm -m") as per your advice, but I could not find codes for this; also all of my files are files.pgen (not files.vcf), so not sure how to join these files.pgen for addressing the issue.

Therefore, I resorted to removing all duplications by both methods below for each file.pgen, since using one of them did not work:
plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r #way1
plink2 --pfile c1 --rm-dup force-first --make-pgen --out c1 #way2

And then I used the following code with another error:
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs
PLINK v2.00a3.7LM 64-bit Intel (24 Oct 2022)   www.cog-genomics.org/plink/2.0/
(C) 2005-2022 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CHRs.log.
Options in effect:
  --make-pgen
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt

Start time: Wed Sep 20 16:56:12 2023

515522 MiB RAM detected; reserving 257761 MiB for main workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Error: Non-concatenating --pmerge-list is under development.

So, I appreciate your continued support.
Vào lúc 18:46:07 UTC+10 ngày Thứ Ba, 19 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 20, 2023, 5:42:56 AM9/20/23
to plink2-users
0. Why are you not including $a in the --set-all-var-ids template string?  Do you intentionally want to keep only one part of each split multiallelic variant?

1. The error message implies that you mismanaged your files in a way that makes e.g. one chromosome appear twice.  You can take a quick look at your chromosome codes with e.g. "tail <.pvar filename>" on each .pvar.

Huy Nguyen

unread,
Sep 20, 2023, 6:35:17 AM9/20/23
to plink2-users
Thanks Chris.

0. Why are you not including $a in the --set-all-var-ids template string?  Do you intentionally want to keep only one part of each split multiallelic variant?  
As I plan to extract SNP dosages for MR (Mendelian randomization) analysis, I thought it would be fine to keep only part of it (I'm not sure given my limited understanding in the field). Do you think by doing so can affect downstream analysis for MR studies? 
Also, could you please advise me on the following code I revised as per your suggestion including $a?
plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]$a

1. The error message implies that you mismanaged your files in a way that makes e.g. one chromosome appear twice.  You can take a quick look at your chromosome codes with e.g. "tail <.pvar filename>" on each .pvar.
Yes, I did mismatch one file, so I will do it again, very appreciating your sharp diagnosis.

Vào lúc 19:42:56 UTC+10 ngày Thứ Tư, 20 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,
Sep 21, 2023, 7:17:46 PM9/21/23
to plink2-users
Thanks Chris so much. With your useful advice, I was able to merge all files.pgen into one file.pgen successfully using the following code:
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

Then I used the following code to estimate polygenic risk score (PRS), but a new error happened below. However, if I converted  file.pgen into file.fam, .bim, .bed to serve PRS estimation, I got two problems: 1 is not enough space left on remote server, and 2 is you advised we may not preserve dosage if working on bfile. 
Rscript software/PRSice/PRSice.R --prsice software/PRSice/PRSice_linux --dir /home/user --base data/sumstats/TL_Summarydata.txt --target CHRs --out output/PRS_TL --no-regress --lower 5.00E-08 #
PRSice 2.3.5 (2021-09-20)
https://github.com/choishingwan/PRSice
(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O'Reilly

GNU General Public License v3
If you use PRSice in any published work, please cite:
Choi SW, O'Reilly PF.
PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.
GigaScience 8, no. 7 (July 1, 2019)
2023-09-22 09:04:35
./software/PRSice/PRSice_linux \
    --a1 A1 \
    --a2 A2 \
    --bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \
    --base data/sumstats/TL_Summarydata.txt \
    --binary-target F \
    --bp BP \
    --chr CHR \
    --clump-kb 250kb \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --interval 5e-05 \
    --lower 5e-08 \
    --no-regress  \
    --num-auto 22 \
    --out output/PRS_TL \
    --pvalue P \
    --seed 1633606172 \
    --snp SNP \
    --stat BETA \
    --target CHRs \
    --thread 1 \
    --upper 0.5

Initializing Genotype file: CHRs (bed)

Start processing TL_Summarydata
==================================================

Base file: data/sumstats/TL_Summarydata.txt
Header of file is:
SNP     CHR     BP      A1      A2      BETA    P

Reading 100.00%
24 variant(s) observed in base file, with:
5 ambiguous variant(s) excluded
19 total variant(s) included from base file

Loading Genotype info from target
==================================================

Error: Cannot open file: CHRs.fam

Error:
Execution halted


So, my questions seeking your help are: 
1. What should I do to tackle the above issue? 
2. I have 24 variants, but why 5 variants were excluded, what should I do to keep these 5 variants since they are also important to my analysis (PRS estimation and performing MR analysis)?

I'm still looking forward to receiving your continued support.
Vào lúc 20:35:17 UTC+10 ngày Thứ Tư, 20 tháng 9, 2023, Huy Nguyen đã viết:

Huy Nguyen

unread,
Sep 26, 2023, 4:14:53 AM9/26/23
to plink2-users
Dear All,

I tried and retried many times to make multiple files.pgen (each for one chromosome) from files.bgen using the following codes which I applied from some of our plink2-users group discussions:
plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r,\$a

And then validated using the following code:

plink2 --pfile c1 --validate #after all, all files (c1, c2, c3, ..., c22 were all good, without any corrupted problem)

And then, merged these files.pgen using the following codes: 
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs

But I got the following error:
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)

Options in effect:
  --make-pgen
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt

Hostname: hitchpc
Working directory: /mnt/wd12TB
Start time: Mon Sep 25 06:27:46 2023

Random number seed: 1695587266
515522 MiB RAM detected, ~509216 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Tue Sep 26 05:36:47 2023


So, I appreciate if anyone could advise why this error persisted and how can I address this issue? Is this because I used a comma in the code making files.pgen? (I mean should I use a comma or not before \$a at the end of the code? or which of the following codes should I use in order to avoid this above error at later stage:
Option 1: 
plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r,\$a
Option 2: 
plink2 --bgen ukb22828_c1_b0_v3.bgen ref-first --sample ukb22828_c1_b0_v3_s487159.sample --make-pgen --new-id-max-allele-len 100 missing --out c1 --set-all-var-ids @:#[ukb]\$r\$a

Thank you and really looking forward to receiving your help.
Vào lúc 09:17:46 UTC+10 ngày Thứ Sáu, 22 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,
Sep 26, 2023, 6:46:33 AM9/26/23
to plink2-users
1. Please post the output of "head -n 2 CHRs-merge.pvar".
2. Do you still see this error if you only try to merge chromosomes 21 and 22?

Huy Nguyen

unread,
Sep 26, 2023, 9:54:00 AM9/26/23
to plink2-users
Thanks Christopher so much for your continued support. Here is the output done as per your advice:

1. Please post the output of "head -n 2 CHRs-merge.pvar".
 head -n 2 CHRs-merge.pvar
#CHROM  POS     ID      REF     ALT
1       10177   1:10177[ukb]A,AC        A       AC

2. Do you still see this error if you only try to merge chromosomes 21 and 22?
After I tried to merge only two files, I think it is successful now. So, why was this, and how I address when merging 22 files? 
 plink2 --pmerge-list List_2files21_22_pgen_names.txt --make-pgen --out CHRs21_22
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)   www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to CHRs21_22.log.
Options in effect:
  --make-pgen
  --out CHRs21_22
  --pmerge-list List_2files21_22_pgen_names.txt

Start time: Tue Sep 26 21:28:20 2023
515522 MiB RAM detected, ~509543 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 2 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs21_22-merge.psam .
--pmerge-list: 2 .pvar files scanned.
Concatenation job detected.
Concatenating... 2516841/2516841 variants complete.
Results written to CHRs21_22-merge.pgen + CHRs21_22-merge.pvar .

487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs21_22-merge.psam.
2516841 variants loaded from CHRs21_22-merge.pvar.

Note: No phenotype data present.
Writing CHRs21_22.psam ... done.
Writing CHRs21_22.pvar ... done.
Writing CHRs21_22.pgen ... done.
End time: Tue Sep 26 23:49:05 2023


I'm looking forward to receiving your next guide.
Vào lúc 20:46:33 UTC+10 ngày Thứ Ba, 26 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 26, 2023, 12:09:16 PM9/26/23
to plink2-users
Ok, the first line of CHRs-merge.pvar looks fine, so the error message is strange.  If you just try to do whatever you wanted to do next with CHRS-merge.{pgen,pvar,psam} as input, what happens?

(In the meantime, there isn't a good reason to have "ukb" in your variant IDs.  "b37" / "b38" might make sense because the same variant will have a different position depending on the reference genome build.  But the UK Biobank data releases don't use different variant coordinates than the rest of the world.)

Huy Nguyen

unread,
Sep 28, 2023, 3:42:23 PM9/28/23
to plink2-users
Thanks Christopher. Here are the issues:

When I tried to merge those files.pgen for the fourth time, the error persisted below:
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)
Options in effect:
  --make-pgen
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt
Start time: Wed Sep 27 12:22:56 2023
Random number seed: 1695781376
515522 MiB RAM detected, ~508924 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 00:50:53 2023

When I continued to do GWAS, the same error happened:
plink2 --pfile CHRs-merge --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogBUN_mg_dl --covar data/Covariatesdata.txt --covar-name PC{1..10}, Age, Tuoi, Sex --extract TL_snplist_All.txt --out output/GWAS_BUN.cvrt

PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)   www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to output/GWAS_BUN.cvrt.log.
Options in effect:
  --covar data/Covariatesdata.txt
  --covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, Age, Tuoi, Sex
  --extract TL_snplist_All.txt
  --glm hide-covar
  --out output/GWAS_BUN.cvrt
  --pfile CHRs-merge
  --pheno data/Pheno_KFs.txt
  --pheno-name LogBUN_mg_dl

Start time: Fri Sep 29 05:40:12 2023
515522 MiB RAM detected, ~508944 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:40:12 2023

So, I lost my direction where to go and what should I do next?
Any of your next guide will be very appreciated.
Vào lúc 02:09:16 UTC+10 ngày Thứ Tư, 27 tháng 9, 2023, chrch...@gmail.com đã viết:

Huy Nguyen

unread,
Sep 28, 2023, 3:46:59 PM9/28/23
to plink2-users
Thanks Christopher. Here are the issues:

When I tried to merge those files.pgen for the fourth time, the error persisted below:
plink2 --pmerge-list List_22files_pgen_names.txt --make-pgen --out CHRs
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)
Options in effect:
  --make-pgen
  --out CHRs
  --pmerge-list List_22files_pgen_names.txt
Start time: Wed Sep 27 12:22:56 2023
Random number seed: 1695781376
515522 MiB RAM detected, ~508924 available; reserving 257761 MiB for main
workspace.
Using up to 36 threads (change this with --threads).
--pmerge-list: 22 filesets specified.
--pmerge-list: 487409 samples present.
--pmerge-list: Merged .psam written to CHRs-merge.psam .
--pmerge-list: 22 .pvar files scanned.
Concatenation job detected.
Concatenating... 93095623/93095623 variants complete.
Results written to CHRs-merge.pgen + CHRs-merge.pvar .
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 00:50:53 2023


When I checked the first row of the output of "head -n 2 CHRs-merge.pvar".
head -n 2 CHRs-merge.pvar
[ukb]A,C        A       C
17      13837789        17:13837789[ukb]T,C     T       C


When I continued to do GWAS, the same error happened:
plink2 --pfile CHRs-merge --glm hide-covar --pheno data/Pheno_KFs.txt --pheno-name LogBUN_mg_dl --covar data/Covariatesdata.txt --covar-name PC{1..10}, Age, Tuoi, Sex --extract TL_snplist_All.txt --out output/GWAS_BUN.cvrt
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)   www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to output/GWAS_BUN.cvrt.log.
Options in effect:
  --covar data/Covariatesdata.txt
  --covar-name PC1, PC2, PC3, PC4, PC5, PC6, PC7, PC8, PC9, PC10, Age, Tuoi, Sex
  --extract TL_snplist_All.txt
  --glm hide-covar
  --out output/GWAS_BUN.cvrt
  --pfile CHRs-merge
  --pheno data/Pheno_KFs.txt
  --pheno-name LogBUN_mg_dl

Start time: Fri Sep 29 05:40:12 2023
515522 MiB RAM detected, ~508944 available; reserving 257761 MiB for main
workspace.
Using up to 36 threads (change this with --threads).
487409 samples (264221 females, 222938 males, 250 ambiguous; 487409 founders)
loaded from CHRs-merge.psam.
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:40:12 2023


So, I lost my direction where to go and what should I do next?
Any of your next guide will be very appreciated.

Vào lúc 05:42:23 UTC+10 ngày Thứ Sáu, 29 tháng 9, 2023, Huy Nguyen đã viết:

Christopher Chang

unread,
Sep 28, 2023, 3:47:26 PM9/28/23
to plink2-users
1. What is the .log output of "plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups"?
2. Assuming you see an error there, if you then run "head CHRs-merge.pvar > CHRs-merge-head.pvar" followed by "plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups", does that work?
3. If you still see an error there, can you post CHRs-merge-head.pvar (do NOT copy-and-paste, I need the binary file in this case)?

Huy Nguyen

unread,
Sep 28, 2023, 4:02:05 PM9/28/23
to plink2-users
Thanks Christopher.

1. What is the .log output of "plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups"?
plink2 --pvar CHRs-merge.pvar --write-snplist allow-dups
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)   www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --pvar CHRs-merge.pvar
  --write-snplist allow-dups
Start time: Fri Sep 29 05:49:35 2023
515522 MiB RAM detected, ~508950 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
Error: Line 1 of CHRs-merge.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:49:35 2023

2. Assuming you see an error there, if you then run "head CHRs-merge.pvar > CHRs-merge-head.pvar" followed by "plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups", does that work?
head CHRs-merge.pvar > CHRs-merge-head.pvar
and followed by:
plink2 --pvar CHRs-merge-head.pvar --write-snplist allow-dups
PLINK v2.00a4.8LM 64-bit Intel (15 Sep 2023)   www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink2.log.
Options in effect:
  --pvar CHRs-merge-head.pvar
  --write-snplist allow-dups
Start time: Fri Sep 29 05:52:43 2023
515522 MiB RAM detected, ~508934 available; reserving 257761 MiB for main

workspace.
Using up to 36 threads (change this with --threads).
Error: Line 1 of CHRs-merge-head.pvar has fewer tokens than expected.
End time: Fri Sep 29 05:52:43 2023


3. If you still see an error there, can you post CHRs-merge-head.pvar (do NOT copy-and-paste, I need the binary file in this case)?
Given my project rules, is there any way that I can post this file to you only? or could you set up the way that I can send the file to you only?

Vào lúc 05:47:26 UTC+10 ngày Thứ Sáu, 29 tháng 9, 2023, chrch...@gmail.com đã viết:

Christopher Chang

unread,
Sep 28, 2023, 4:03:31 PM9/28/23
to plink2-users
My email address is on the plink2 website.
Reply all
Reply to author
Forward
0 new messages