Apparent problem with --keep-ambig

Aaron Holleman

unread,

Nov 29, 2017, 9:31:51 AM11/29/17

to PRSice

Hi,

One more issue I've noticed -- When I include --keep-ambig in my code, PRSice seems to ignore it and it still drops the ambiguous SNPs. Below is my PRSice code and the first part of the output. Any advice would be greatly appreciated. Thanks!

Rscript PRSice.R --dir . --prsice PRSice_mac --base PGC_training_hg38.txt --snp snpid --pvalue p --stat or --A1 a1 --A2 a2 --beta F --target targetfile --keep-ambig --binary-target T --cov-file Covariates_435_Sex_PC_Pheno.txt --cov-col Sex,PC1,PC2 --lower 0.000001 --upper 1 --bar-levels 0.000001,0.00001,0.0001,0.001,0.01,0.05,0.1,0.2,0.3,0.4,0.5,1 --fastscore --out PRSice_Results/Compare_with_Plink --no-clump

PRSice 2.0.14.beta (27 October 2017)

https://github.com/choishingwan/PRSice

GNU General Public License v3

If you use PRSice in any publised work, please cite:

Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)

PRSice: Polygenic Risk Score software.

Bioinformatics 31 (9): 1466-1468

2017-11-29 09:19:06

./PRSice_mac

--fastscore

--stat or

--model add

--snp snpid

--binary-target T

--A2 a2

--bar-levels 0.000001,0.00001,0.0001,0.001,0.01,0.05,0.1,0.2,0.3,0.4,0.5,1

--upper 1

--thread 1

--target targetfile

--pvalue p

--lower 1e-06

--A1 a1

--out PRSice_Results/Compare_with_Plink

--cov-file Covariates_435_Sex_PC_Pheno.txt

--seed 708523575

--cov-col Sex,PC1,PC2

--no-clump

--base PGC_training_hg38.txt

Loading Genotype file: targetfile (bed)

435 people (196 male(s), 239 female(s)) observed

435 founder(s) included

6490 ambiguous variant(s) excluded

42526 variant(s) included

Message has been deleted

Sam Choi

unread,

Nov 29, 2017, 10:19:46 AM11/29/17

to PRSice

Hi, thanks for reporting this issue. We have now fixed it in the 20171116 build. You can download that here

Side note: We are actually interested in the reason behind the use of --keep-ambig, if you don't mind, could you let us know why do you want to keep the ambiguous SNPs?

Aaron Holleman

unread,

Nov 29, 2017, 4:00:07 PM11/29/17

to PRSice

Thanks, Sam. The main reason for using --keep-ambig was just to compare the PRSice output with the results of an earlier analysis using Plink and R where I didn't drop the ambiguous variants. Aside from this comparison goal, there are some instances where it seems likely that all variants in the base and target files match for strand and I have some interest in seeing the results if all SNPs are included in the scoring, though this is really just curiosity and I wouldn't plan to do anything with such results unless it were clear that the files did indeed match for strand.

Best,

Aaron

Sam Choi

unread,

Nov 29, 2017, 8:48:50 PM11/29/17

to PRSice

Thanks! That's actually the exact reason why we implemented the --keep-ambig option. Thanks for letting us know!

Aaron Holleman

unread,

Dec 1, 2017, 12:31:52 PM12/1/17

to PRSice

Hi Sam,

When I use --keep-ambig in PRSice I'm getting polygenic risk scores that are slightly different from scores generated in Plink (when I also include the ambiguous variants in the Plink analysis). It's clear that the same number of SNPs are being used for scoring, but the scores for each person are different in PRSice vs. Plink. I'm not sure why this is happening. This doesn't happen when I exclude the ambiguous SNPs, in which case PRSice and Plink give exactly the same scores.

Something odd I noticed in the process of doing these analyses with PRSice is that when I use --keep-ambig and include all of the bar-levels directly below, I get output that shows that the correct number of SNPs have been use for scoring at each p-value threshold, with 49,016 SNPs used at a threshold of p=1.

Rscript PRSice.R --dir . --prsice PRSice_mac --base PGC_training_hg38.txt --snp snpid --pvalue p --stat or --A1 a1 --A2 a2 --beta F --target targetfile --binary-target T --cov-file Covariates_435_Sex_PC_Pheno.txt --cov-col Sex,PC1,PC2 --lower 0.000001 --upper 1 --bar-levels 0.000001,0.00001,0.0001,0.001,0.01,0.05,0.1,0.2,0.3,0.4,0.5,1 --fastscore --out PRSice_Results/Compare_with_Plink_KeepAmbig --no-clump --keep-ambig

However, when I do the same analysis but only include a bar-level of 0.1 (as below), I get plots for thresholds of 0.1 and 1 and the output indicates that the number of SNPs used for scoring at p=1 is 20,999 which isn't right. It turns out that this 20,999 is exactly the number of SNPs that would be used for scoring at a threshold of 0.10 if ambiguous SNPs were excluded (14,509 SNPs) plus the total number of ambiguous SNPs that are kept by including the --keep-ambig flag (6,490 ambiguous SNPs). This issue doesn't occur when I don't use --keep-ambig. I don't know if this weird result is somehow related to the discrepancy noted above between PRSice and Plink scores.

Rscript PRSice.R --dir . --prsice PRSice_mac --base PGC_training_hg38.txt --snp snpid --pvalue p --stat or --A1 a1 --A2 a2 --beta F --target targetfile --binary-target T --cov-file Covariates_435_Sex_PC_Pheno.txt --cov-col Sex,PC1,PC2 --lower 0.000001 --upper 1 --bar-levels 0.1 --fastscore --out PRSice_Results/Compare_with_Plink_KeepAmbig --no-clump --keep-ambig

Thanks for your help on these and the previous issues. And thanks for providing PRSice!

Best,

Aaron

Sam Choi

unread,

Dec 1, 2017, 9:35:38 PM12/1/17

to PRSice

hm... so it seems like a bug where PRSice has correctly calculate the number of SNPs for p-threshold 0.1 but not for 1. I will have a look onto it and see if I can reproduce your problem.

(Another reason why PLINK and PRSice result can differ is that PRSice will also do flipping between the base and target file which PLINK doesn't automatically do. So that might be one of the reason)

Reply all

Reply to author

Forward