PRS.R2 results

1,166 views
Skip to first unread message

Judit

unread,
Feb 13, 2018, 8:19:30 AM2/13/18
to PRSice
Hi!
I'm using PRSice2 with covariates for my target sample. I would like to know which R2 should be consider from the summary output file. I think that in the barplot PRS.R2 is represented but this one didn't consider covariates effect, isn't it?
  1. PRS.R2 - Variance explained by the PRS. If prevalence is provided, this will be adjusted for ascertainment
  2. Full.R2 - Variance explained by the full model (including the covariates). If prevalence is provided, this will be adjusted for ascertainment
  3. Null.R2 - Variance explained by the covariates. If prevalence is provided, this will be adjusted for ascertainment

In this summary output file appears a p-value, is the same considering the three models (PRS, Full and Null).

Also I saw that in some barplot appears in brackets (Nagelkerke's) but in others not, why?  For example, it didn't appear in any of my plots. 

Thank you very much

Best,

Judit

Sam Choi

unread,
Feb 13, 2018, 4:18:33 PM2/13/18
to PRSice
The PRS.R2 is the R2 of the full model (model contain PRS+ covariate) minus the R2 of the Null model (model containing only the covariates)

If you target phenotype is continuous trait, then the R2 is well defined, thus you don't need the pseudo R2 (Nagelkerke's R2). However, if you have a binary target phenotype, the R2 is not well defined, so we will use the pseudo R2 instead

Judit

unread,
Feb 14, 2018, 3:22:36 AM2/14/18
to PRSice
Thank you very much for your answer. Both my discovery and target phenotype (PLINK bfile) are binary traits but it seems that is not recognized like this. This is the command I'using:

Rscript PRSice.R --dir . --prsice PRSice_linux --base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta --beta F --target merge3_qc_strict3 --snp SNP --A1 A1 --stat OR --pvalue P --chr CHR --cov-file cov_all_PRS.txt --cov-col @C[1-20],STUDY --binary-target T --quantile 10 -o ADHD3


PRSice 2.1.0.beta (14 Jan 2018)
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3

If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468

2018-02-14 08:59:56
./PRSice_linux \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001000,0.050000,0.100000,0.200000,0.300000,0.400000,0.500000,1.000000 \
    --base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta \
    --binary-target T \
    --bp BP \
    --chr CHR \
    --clump-kb 250 \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov-col @C[1-20],STUDY \
    --cov-file cov_all_PRS.txt \
    --info-base INFO,0.9 \
    --interval 0.000050 \
    --lower 0.000100 \
    --model add \
    --out ADHD3 \
    --pvalue P \
    --se SE \
    --seed 1034816216 \
    --snp SNP \
    --stat OR \
    --target merge3_qc_strict3 \
    --thread 1 \
    --upper 0.500000


Can you help me please????

Best,

Judit

Sam Choi

unread,
Feb 14, 2018, 8:11:11 PM2/14/18
to PRSice
Ar, I forgot to mention that PRSice 2 no longer write the Nagelkerke into the output. So even if you are using binary traits, the "Nagelkerke" will not display on your plot. You can see the updated document here

Judging from your log, it seems like PRSice does read in your trait as a binary trait (though it will be clearer if we can see the rest of the log)

Side note: You no longer need to do --beta F, simply not supplying the --beta flag means --beta F. 


Judit

unread,
Feb 15, 2018, 2:53:10 AM2/15/18
to PRSice
OK!!! thank you very much for you help. Attached the complet log file just to check that PRSice is reading my trait as a binary trait.

PRSice 2.1.0.beta (14 Jan 2018) 
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3

If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468

2018-02-13 15:23:06
./PRSice_linux \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001000,0.050000,0.100000,0.200000,0.300000,0.400000,0.500000,1.000000 \
    --base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta \
    --binary-target T \
    --bp BP \
    --chr CHR \
    --clump-kb 250 \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov-col @C[1-10],STUDY \
    --cov-file cov_all_PRS.txt \
    --info-base INFO,0.9 \
    --interval 0.000050 \
    --lower 0.000100 \
    --model add \
    --out ADHD2 \
    --pvalue P \
    --se SE \
    --seed 2347373689 \
    --snp SNP \
    --stat OR \
    --target merge3_qc_strict3 \
    --thread 1 \
    --upper 0.500000


Loading Genotype file: merge3_qc_strict3 (bed) 
 

6370 people (3065 male(s), 3305 female(s)) observed 
6370 founder(s) included 
 

829440 ambiguous variant(s) excluded 
5127867 variant(s) included 
 

1 region included 

Start processing 
daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70 
============================== 
 

Base file: 
daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta 
8094094 variant(s) observed in base file, with: 
1 ambiguous variant(s) excluded 
3335940 variant(s) not found in target file 
384216 mismatched variant(s) excluded 
135323 variant(s) with INFO score less than 0.900000 
4248911 total variant(s) included from base file 
 
 

Start performing clumping 

64384 MB RAM detected; reserving 32192 MB for clumping 
 

Number of variant(s) after clumping : 118477 
 

There are a total of 1 phenotype to process 
 

4287 control(s) 
2083 case(s) 
 

11 valid covariates included 
 

Processing the covariate file: cov_all_PRS.txt 
============================== 
 

2 sample(s) with invalid covariate: 
 
Covariate Number of Missing Samples 
STUDY
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
 

After reading the covariate file, 6368 sample(s) included 
in the analysis 
 

There are 1 region(s) with p-value less than 1e-5. Please 
note that these results are inflated due to the overfitting 
inherent in finding the best-fit PRS (but it's still best 
to find the best-fit PRS!). 
You can use the --perm option (see manual) to calculate an 
empirical P-value. 

Sam Choi

unread,
Feb 15, 2018, 11:16:07 AM2/15/18
to PRSice
Yes it is, note:

4287 control(s) 
2083 case(s) 
 

Judit

unread,
Feb 16, 2018, 3:44:36 PM2/16/18
to PRSice
Thank you very much for your help!!!

Judit

Oleg

unread,
Oct 24, 2018, 11:10:43 AM10/24/18
to PRSice
Dear Sam,

it is quite clear about the resulting R2=R2_full-R2_null. However I am wondering, from which model the resulting p-value is taken? As well as the Coefficient and Standard.Error?
Thanks in advance,
Oleg

Sam Choi

unread,
Oct 24, 2018, 4:58:10 PM10/24/18
to PRSice
Those are all w.r.t PRS within the full model.


Message has been deleted

Ciaran Kelly

unread,
Jul 1, 2019, 6:20:18 AM7/1/19
to PRSice
Can I ask why PRSice uses this method for accounting for covariates? Why can't covariates be included in the model that adjusts directly for them in one go? Am I correct in saying PRSice implements a sort of "stepwise" regression method, where the covariates are predictive independent variables alongside the snp variables? As in the full model doesn't distinguish at all between the covariates and independent variables when predicting?

Sam Choi

unread,
Jul 1, 2019, 5:02:20 PM7/1/19
to PRSice
Our model is calculated as

Full R2 = Phenotype ~ PRS + Covariates
Null R2 = Phenotype ~ Covariates

PRS.R2 = Full R2 - Null R2

There might be better way to account for it (e.g. adjusted R2) yet we haven't got the time to investigate that, that's why we are sticking with the current method.

Hadasa Kaufman

unread,
Jan 16, 2024, 8:50:47 AM1/16/24
to PRSice
Hi, I am trying to I am trying to restore the R2 and calculate RMSE, 
using this Python code: 

----------------------------------------------------------------------------------------------------------------------------
# Reshape the data if needed
prs_scores = prs_scores.reshape(-1, 1)

# Define the model
model = LinearRegression()

# Fit the model with scaled PRS scores
model.fit(prs_scores, observed_phenotypes)

#predicted_phenotypes_scaled = model.predict(prs_scores_scaled)
predicted_phenotypes = model.predict(prs_scores)

# Calculate residuals
residuals = observed_phenotypes - predicted_phenotypes

# Square residuals
squared_residuals = residuals**2
my_rmse=math.sqrt(sum(squared_residuals)/len(squared_residuals))
print("my_rmse:",my_rmse)
# Calculate Mean Squared Error
rmse = math.sqrt(mean_squared_error(observed_phenotypes, predicted_phenotypes))

print("Root Mean Squared Error:", rmse)

my_R2 = 1 - (sum((observed_phenotypes-predicted_phenotypes)**2)/sum((observed_phenotypes-mean(observed_phenotypes))**2))
print("my_R2: ", my_R2)
r2 = r2_score(observed_phenotypes, predicted_phenotypes)
print("R2: ", r2)
------------------------------------------------------------------------------------------------------------------------------------------

However, I get different results from each of the  R2 in the summary file. 
can you explain to me what am I doing wrong?

Thank you!
Hadasa


Reply all
Reply to author
Forward
0 new messages