PRS.R2 results

Judit

unread,

Feb 13, 2018, 8:19:30 AM2/13/18

to PRSice

Hi!

I'm using PRSice2 with covariates for my target sample. I would like to know which R2 should be consider from the summary output file. I think that in the barplot PRS.R2 is represented but this one didn't consider covariates effect, isn't it?

PRS.R2 - Variance explained by the PRS. If prevalence is provided, this will be adjusted for ascertainment
Full.R2 - Variance explained by the full model (including the covariates). If prevalence is provided, this will be adjusted for ascertainment
Null.R2 - Variance explained by the covariates. If prevalence is provided, this will be adjusted for ascertainment

In this summary output file appears a p-value, is the same considering the three models (PRS, Full and Null).

Also I saw that in some barplot appears in brackets (Nagelkerke's) but in others not, why? For example, it didn't appear in any of my plots.

Thank you very much

Best,

Judit

Sam Choi

unread,

Feb 13, 2018, 4:18:33 PM2/13/18

to PRSice

The PRS.R2 is the R2 of the full model (model contain PRS+ covariate) minus the R2 of the Null model (model containing only the covariates)

If you target phenotype is continuous trait, then the R2 is well defined, thus you don't need the pseudo R2 (Nagelkerke's R2). However, if you have a binary target phenotype, the R2 is not well defined, so we will use the pseudo R2 instead

Judit

unread,

Feb 14, 2018, 3:22:36 AM2/14/18

to PRSice

Thank you very much for your answer. Both my discovery and target phenotype (PLINK bfile) are binary traits but it seems that is not recognized like this. This is the command I'using:

Rscript PRSice.R --dir . --prsice PRSice_linux --base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta --beta F --target merge3_qc_strict3 --snp SNP --A1 A1 --stat OR --pvalue P --chr CHR --cov-file cov_all_PRS.txt --cov-col @C[1-20],STUDY --binary-target T --quantile 10 -o ADHD3

PRSice 2.1.0.beta (14 Jan 2018)

https://github.com/choishingwan/PRSice

GNU General Public License v3

If you use PRSice in any published work, please cite:

Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)

PRSice: Polygenic Risk Score software.

Bioinformatics 31 (9): 1466-1468

2018-02-14 08:59:56

./PRSice_linux \

--A1 A1 \

--A2 A2 \

--bar-levels 0.001000,0.050000,0.100000,0.200000,0.300000,0.400000,0.500000,1.000000 \

--base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta \

--binary-target T \

--bp BP \

--chr CHR \

--clump-kb 250 \

--clump-p 1.000000 \

--clump-r2 0.100000 \

--cov-col @C[1-20],STUDY \

--cov-file cov_all_PRS.txt \

--info-base INFO,0.9 \

--interval 0.000050 \

--lower 0.000100 \

--model add \

--out ADHD3 \

--pvalue P \

--se SE \

--seed 1034816216 \

--snp SNP \

--stat OR \

--target merge3_qc_strict3 \

--thread 1 \

--upper 0.500000

Can you help me please????

Best,

Judit

Sam Choi

unread,

Feb 14, 2018, 8:11:11 PM2/14/18

to PRSice

Ar, I forgot to mention that PRSice 2 no longer write the Nagelkerke into the output. So even if you are using binary traits, the "Nagelkerke" will not display on your plot. You can see the updated document here

Judging from your log, it seems like PRSice does read in your trait as a binary trait (though it will be clearer if we can see the rest of the log)

Side note: You no longer need to do --beta F, simply not supplying the --beta flag means --beta F.

Judit

unread,

Feb 15, 2018, 2:53:10 AM2/15/18

to PRSice

OK!!! thank you very much for you help. Attached the complet log file just to check that PRSice is reading my trait as a binary trait.

PRSice 2.1.0.beta (14 Jan 2018)

https://github.com/choishingwan/PRSice

GNU General Public License v3

If you use PRSice in any published work, please cite:

Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)

PRSice: Polygenic Risk Score software.

Bioinformatics 31 (9): 1466-1468

2018-02-13 15:23:06

./PRSice_linux \

--A1 A1 \

--A2 A2 \

--bar-levels 0.001000,0.050000,0.100000,0.200000,0.300000,0.400000,0.500000,1.000000 \

--base daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta \

--binary-target T \

--bp BP \

--chr CHR \

--clump-kb 250 \

--clump-p 1.000000 \

--clump-r2 0.100000 \

--cov-col @C[1-10],STUDY \

--cov-file cov_all_PRS.txt \

--info-base INFO,0.9 \

--interval 0.000050 \

--lower 0.000100 \

--model add \

--out ADHD2 \

--pvalue P \

--se SE \

--seed 2347373689 \

--snp SNP \

--stat OR \

--target merge3_qc_strict3 \

--thread 1 \

--upper 0.500000

Loading Genotype file: merge3_qc_strict3 (bed)

6370 people (3065 male(s), 3305 female(s)) observed

6370 founder(s) included

829440 ambiguous variant(s) excluded

5127867 variant(s) included

1 region included

Start processing

daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70

==============================

Base file:

daner_meta_filtered_NA_iPSYCH23_PGC11_sigPCs_woSEX_2ell6sd_EUR_Neff_70.meta

8094094 variant(s) observed in base file, with:

1 ambiguous variant(s) excluded

3335940 variant(s) not found in target file

384216 mismatched variant(s) excluded

135323 variant(s) with INFO score less than 0.900000

4248911 total variant(s) included from base file

Start performing clumping

64384 MB RAM detected; reserving 32192 MB for clumping

Number of variant(s) after clumping : 118477

There are a total of 1 phenotype to process

4287 control(s)

2083 case(s)

11 valid covariates included

Processing the covariate file: cov_all_PRS.txt

==============================

2 sample(s) with invalid covariate:

Covariate Number of Missing Samples

STUDY 0

C1 0

C2 0

C3 0

C4 0

C5 0

C6 0

C7 0

C8 0

C9 0

C10 0

After reading the covariate file, 6368 sample(s) included

in the analysis

There are 1 region(s) with p-value less than 1e-5. Please

note that these results are inflated due to the overfitting

inherent in finding the best-fit PRS (but it's still best

to find the best-fit PRS!).

You can use the --perm option (see manual) to calculate an

empirical P-value.

Sam Choi

unread,

Feb 15, 2018, 11:16:07 AM2/15/18

to PRSice

Yes it is, note:

4287 control(s)

2083 case(s)

Judit

unread,

Feb 16, 2018, 3:44:36 PM2/16/18

to PRSice

Thank you very much for your help!!!

Judit

Oleg

unread,

Oct 24, 2018, 11:10:43 AM10/24/18

to PRSice

Dear Sam,

it is quite clear about the resulting R2=R2_full-R2_null. However I am wondering, from which model the resulting p-value is taken? As well as the Coefficient and Standard.Error?

Thanks in advance,

Oleg

Sam Choi

unread,

Oct 24, 2018, 4:58:10 PM10/24/18

to PRSice

Those are all w.r.t PRS within the full model.

Message has been deleted

Ciaran Kelly

unread,

Jul 1, 2019, 6:20:18 AM7/1/19

to PRSice

Can I ask why PRSice uses this method for accounting for covariates? Why can't covariates be included in the model that adjusts directly for them in one go? Am I correct in saying PRSice implements a sort of "stepwise" regression method, where the covariates are predictive independent variables alongside the snp variables? As in the full model doesn't distinguish at all between the covariates and independent variables when predicting?

Sam Choi

unread,

Jul 1, 2019, 5:02:20 PM7/1/19

to PRSice

Our model is calculated as

Full R2 = Phenotype ~ PRS + Covariates

Null R2 = Phenotype ~ Covariates

PRS.R2 = Full R2 - Null R2

There might be better way to account for it (e.g. adjusted R2) yet we haven't got the time to investigate that, that's why we are sticking with the current method.

Hadasa Kaufman

unread,

Jan 16, 2024, 8:50:47 AM1/16/24

to PRSice

Hi, I am trying to I am trying to restore the R2 and calculate RMSE,

using this Python code:

----------------------------------------------------------------------------------------------------------------------------
# Reshape the data if needed
prs_scores = prs_scores.reshape(-1, 1)

# Define the model
model = LinearRegression()

# Fit the model with scaled PRS scores
model.fit(prs_scores, observed_phenotypes)

#predicted_phenotypes_scaled = model.predict(prs_scores_scaled)
predicted_phenotypes = model.predict(prs_scores)

# Calculate residuals
residuals = observed_phenotypes - predicted_phenotypes

# Square residuals
squared_residuals = residuals**2
my_rmse=math.sqrt(sum(squared_residuals)/len(squared_residuals))
print("my_rmse:",my_rmse)
# Calculate Mean Squared Error
rmse = math.sqrt(mean_squared_error(observed_phenotypes, predicted_phenotypes))

print("Root Mean Squared Error:", rmse)

my_R2 = 1 - (sum((observed_phenotypes-predicted_phenotypes)**2)/sum((observed_phenotypes-mean(observed_phenotypes))**2))
print("my_R2: ", my_R2)
r2 = r2_score(observed_phenotypes, predicted_phenotypes)
print("R2: ", r2)

------------------------------------------------------------------------------------------------------------------------------------------

However, I get different results from each of the R2 in the summary file.
can you explain to me what am I doing wrong?

Thank you!
Hadasa

Reply all

Reply to author

Forward