top variant ID missing when use --clump

508 views
Skip to first unread message

Ying Liu

unread,
Mar 8, 2019, 3:23:56 PM3/8/19
to plink2-users
I extracted only 1721 variants in the clumping step but in the log file I saw warning saying "3830 more top vriant IDs missing". I do not quite understand what does this mean. Is this something I need to worry about or I could ignore this? Thanks. 

(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to chr2_female_WHRadjBMI25001_2.log.
Options in effect:
  --bfile chr2_female_WHRadjBMI
  --clump WHRadjBMI_chr2_female.RES_INV_WHRadjBMI_F.glm.linear
  --clump-r2 0.1
  --extract /data1/liuy39/UKB_temp0723/GWAS_new/GWAS_WHRadjBMI/snplist_bfile_chr2_female_WHRadjBMI
  --out chr2_female_WHRadjBMI25001_2

257672 MB RAM detected; reserving 128836 MB for main workspace.
Allocated 54352 MB successfully, after larger attempt(s) failed.
1721 variants loaded from .bim file.
487409 people (223033 males, 264362 females, 14 ambiguous) loaded from .fam.
Ambiguous sex IDs written to chr2_female_WHRadjBMI25001_2.nosex .
--extract: 1721 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 487409 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.990326.
1721 variants and 487409 people pass filters and QC.
Note: No phenotypes present.
Warning: 'rs366225' is missing from the main dataset, and is a top variant.
Warning: 'rs355797' is missing from the main dataset, and is a top variant.
Warning: 'rs3771089' is missing from the main dataset, and is a top variant.
3830 more top variant IDs missing; see log file.
--clump: 44 clumps formed from 1721 top variants.
Results written to chr2_female_WHRadjBMI25001_2.clumped .

Christopher Chang

unread,
Mar 8, 2019, 3:35:52 PM3/8/19
to plink2-users
It means that there are variants in the .glm.linear file missing from either the --bfile or the --extract file.  This is unsurprising if the original linear regression included some variants outside the --extract file.

Jalil Sharif

unread,
Jan 13, 2022, 12:23:59 PM1/13/22
to plink2-users
Hi, I wanted to clarify this further.

I ran:
```
PLINK v1.90p 64-bit (8 Nov 2021)
Options in effect:
  --bed chr_merged.bed
  --bim chr_merged.bim
  --clump park_updated.score
  --clump-field P
  --clump-kb 250
  --clump-p1 1
  --clump-r2 0.1
  --clump-snp-field SNP
  --fam chr1.fam
  --out chr.qc
  --threads 64

1031886 MB RAM detected; reserving 515943 MB for main workspace.
4113097 variants loaded from .bim file.
487409 people (223038 males, 264368 females, 3 ambiguous) loaded from .fam.
Ambiguous sex IDs written to chr.qc.nosex .

Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 487409 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.99122.
4113097 variants and 487409 people pass filters and QC.
Note: No phenotypes present.
```

I got the following message:
```
Warning: 'rs356203' is missing from the main dataset, and is a top variant.
Warning: 'rs356219' is missing from the main dataset, and is a top variant.
Warning: 'rs356215' is missing from the main dataset, and is a top variant.
2357669 more top variant IDs missing; see log file.
```

Does this mean, that the 2357672 variants did not undergo clumping, as the .score file and the .bed file are from different samples?

Best 
Jalil

Christopher Chang

unread,
Jan 13, 2022, 1:10:02 PM1/13/22
to plink2-users
Yes, the variant IDs aren't in sync, and this implies that you are misusing --clump.

Jalil Sharif

unread,
Jan 13, 2022, 1:23:19 PM1/13/22
to plink2-users
But I can still use the results of the .clumped file for generating a PRS?

Christopher Chang

unread,
Jan 13, 2022, 1:26:13 PM1/13/22
to plink2-users
You're on your own here.  The only thing that's obvious to me is that you're misusing the command; I have no interest in looking at the details.

Jalil Sharif

unread,
Jan 13, 2022, 3:05:23 PM1/13/22
to plink2-users
Excuse my ignorance, but in this instance, what do you mean with misusing the command? I filtered the original chromosal .bed files with the SNPs from a GWAS study, I used the same GWAS study for  the subsequent clumping.

Where would I have gone wrong?

Christopher Chang

unread,
Jan 13, 2022, 3:16:02 PM1/13/22
to plink2-users
--clump is intended to be run on the exact same dataset used to generate the GWAS results.  If you're using it in some other manner, you're on your own (or you should consult the person who came up with the method you're using); I'm not interested.
Reply all
Reply to author
Forward
0 new messages