What SNPs does plink select in the --score --q-score-range

40 views
Skip to first unread message

yu zhang

unread,
Jan 27, 2025, 8:53:58 AMJan 27
to plink2-users
I would like to use PLINK --score for the PRS, it is weird that I cannot find the SNPs that is selected for the PRS in the following code. Then I tried to extract those SNPs first (as the code in the 2nd way) and then use --score in PLINK. It has different results as the following code. I have no idea the reason for the difference. I would appreciate it so much if you could help me out.

the first way is using the similar code as in the tutorial
```
yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz | awk '{print $3,$8}' > SNP.pvalue
yzh10@login2:/N/slate/yzh10/numom2b/GDM> echo "0.000001 0 0.000001" > range_list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz > base.QC.tsv
yzh10@login2:/N/slate/yzh10/numom2b/GDM> plink --bfile EUR.QC \
> --score base.QC.tsv 3 4 8 header \
> --write-snplist \
> --q-score-range range_list SNP.pvalue \
> --extract EUR.valid.snp \
> --out EUR.PRS
PLINK v1.90b7.2 64-bit (11 Dec 2023)           www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EUR.PRS.log.
Options in effect:
  --bfile EUR.QC
  --extract EUR.valid.snp
  --out EUR.PRS
  --q-score-range range_list SNP.pvalue
  --score base.QC.tsv 3 4 8 header
  --write-snplist

257366 MB RAM detected; reserving 128683 MB for main workspace.
5419600 variants loaded from .bim file.
5830 people (0 males, 5830 females) loaded from .fam.
--extract: 191265 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 5830 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is in [0.9999995, 1).
191265 variants and 5830 people pass filters and QC.
Note: No phenotypes present.
List of variant IDs written to EUR.PRS.snplist .
Warning: 7241950 lines skipped in --score file (7241950 due to variant ID
mismatch, 0 due to allele code mismatch); see EUR.PRS.nopred for details.
--score: 191265 valid predictors loaded.
Warning: 7241951 lines skipped in --q-score-range data file.
--score: 1 range processed.
Results written to EUR.PRS.*.profile.
```

the results is like this
```
yzh10@login2:/N/slate/yzh10/numom2b/GDM> head EUR.PRS.0.000001.profile
                  FID                   IID  PHENO    CNT   CNT2    SCORE
  202946750069_R03C01   202946750069_R03C01     -9      2      1 1.485e-07
  202947050020_R05C01   202947050020_R05C01     -9      2      1 1.485e-07
  202947050133_R04C01   202947050133_R04C01     -9      2      1 1.485e-07
  203010920076_R01C01   203010920076_R01C01     -9      2      0        0
  203010920076_R02C01   203010920076_R02C01     -9      2      0        0
  203010920076_R04C01   203010920076_R04C01     -9      2      1 1.485e-07
  203010920076_R06C01   203010920076_R06C01     -9      2      1 1.485e-07
  203010920076_R07C01   203010920076_R07C01     -9      2      2 2.97e-07
  203010920090_R01C01   203010920090_R01C01     -9      2      1 1.485e-07
```

the second way is to select SNPs first and then use PLINK --score
```
yzh10@login2:/N/slate/yzh10/numom2b/GDM> gunzip -c base.QC.gz | awk '$7 < 1e-6 {print $3}' > significant_snps.list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> comm -12 <(sort significant_snps.list) <(sort EUR.valid.snp) > final_snps_for_score.list
yzh10@login2:/N/slate/yzh10/numom2b/GDM> plink --bfile EUR.QC \
> --score base.QC.tsv 3 4 8 header \
> --extract final_snps_for_score.list \
> --out EUR.PRS
PLINK v1.90b7.2 64-bit (11 Dec 2023)           www.cog-genomics.org/plink/1.9/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to EUR.PRS.log.
Options in effect:
  --bfile EUR.QC
  --extract final_snps_for_score.list
  --out EUR.PRS
  --score base.QC.tsv 3 4 8 header

257366 MB RAM detected; reserving 128683 MB for main workspace.
5419600 variants loaded from .bim file.
5830 people (0 males, 5830 females) loaded from .fam.
--extract: 47 variants remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 5830 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is in [0.9999995, 1).
47 variants and 5830 people pass filters and QC.
Note: No phenotypes present.
Warning: 7433168 lines skipped in --score file (7433168 due to variant ID
mismatch, 0 due to allele code mismatch); see EUR.PRS.nopred for details.
--score: 47 valid predictors loaded.
--score: Results written to EUR.PRS.profile .
yzh10@login2:/N/slate/yzh10/numom2b/GDM> head  EUR.PRS.profile
                  FID                   IID  PHENO    CNT   CNT2    SCORE
  202946750069_R03C01   202946750069_R03C01     -9     94     28 0.0111638
  202947050020_R05C01   202947050020_R05C01     -9     94     19 -0.000518085
  202947050133_R04C01   202947050133_R04C01     -9     94     22 -0.00160957
  203010920076_R01C01   203010920076_R01C01     -9     94     23 -1.80851e-05
  203010920076_R02C01   203010920076_R02C01     -9     94     17  -0.0035
  203010920076_R04C01   203010920076_R04C01     -9     94     26 0.0123585
  203010920076_R06C01   203010920076_R06C01     -9     94     26 0.00277128
  203010920076_R07C01   203010920076_R07C01     -9     94     23  0.01115
  203010920090_R01C01   203010920090_R01C01     -9     94     24 0.00867234

```

We can see that the results are different in two ways, but they are supposed to be the same. 

Chris Chang

unread,
Jan 27, 2025, 9:08:31 AMJan 27
to yu zhang, plink2-users
These logs don’t quite show why the results should be expected to be equivalent.  Please post a set of files (could contain just a few samples) that allows someone else to replicate what you’re seeing.

--
You received this message because you are subscribed to the Google Groups "plink2-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to plink2-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/plink2-users/409e1378-7c35-4439-9ae7-169832906876n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages