How to intepret when P-value threshold =1 has the highest R2 and smallest P-value?

alexi...@gmail.com

unread,

Sep 14, 2017, 12:36:08 AM9/14/17

to PRSice

Hi

I am studying in a complext disease, when applying PRS found using P=1 threshold will get the most significant result with the largest variance explained. Is it normal? In this case I feel like every SNP matters no matter how significant it is, but it doesn't feel right... And using our sample (less than 1000), the variance explained is almost 20% higher than in the paper that published the PRS model and tested their model in a large sample (more than 70,000). Is it weird?

Many thanks.

Alexis

alexi...@gmail.com

unread,

Sep 14, 2017, 2:35:46 AM9/14/17

to PRSice

p.s. Base data independent of the target data.

Sam Choi

unread,

Sep 14, 2017, 6:29:28 PM9/14/17

to PRSice

This is completely normal to have the best threshold at P=1. If you are worried, you can look at the high-resolution plot and that might tell you something

Did you perform clumping? Did you filtered out related sample? Have you do proper QC on your data? Have you controlled for population stratification? Other than overlapped base + target data, all of these factors can also lead to artificial inflation of the R2

alexi...@gmail.com

unread,

Sep 14, 2017, 9:35:47 PM9/14/17

to PRSice

Hi Sam,

Thanks for the advices.

Does the high-resolution plot look normal to you?

I did clumping (r2>0.1, window 250kb) on the target data, and the public available base data have already undergone clumping. I did proper GWAS QC on the target data, including IBS and PCA.

Regards,

Alexis

high-res.png

Sam Choi

unread,

Sep 15, 2017, 4:18:41 AM9/15/17

to PRSice

Yup, seems ok from the graph.

alexi...@gmail.com

unread,

Sep 15, 2017, 11:06:35 PM9/15/17

to PRSice

Thanks Sam:)

alexi...@gmail.com

unread,

Sep 15, 2017, 11:08:55 PM9/15/17

to PRSice

Means disease is highly highly complex that SNPs with small effects (even when P-value approaching 1) also contributing to disease susceptibility?

Sam Choi

unread,

Sep 16, 2017, 7:18:26 PM9/16/17

to PRSice

Unfortunately, the interpretation of result really depend on your base and target samples and the nature of the study. Without knowing all that, it is impossible to tell what the results mean. All parameter choice, all data input and the study design of your experiment will all makes a different to the interpretation of the result. While we can provide technical support as to whether PRSice is functioning correctly, we cannot and we are not qualify to provide any advice what your results really mean, sorry about that.

Paul O'Reilly

unread,

Sep 28, 2017, 12:32:29 PM9/28/17

to PRSice

To add to this - it's typical in standard PRS analyses to find that the best-fit PRS P-value threshold is P=1, especially when the base GWAS has low power because then the causal variants are spread across the results, meaning that the signal:noise ratio remains relatively high with increasing P-value and so predictive power continues to increase.. while for highly powered GWAS most of the signal is in the upper tail of results (small P-values) and so the signal:noise ratio is very small once those SNPs have been included in the score and so the best-fit PRS will likely include only a fraction of SNPs (eg. Pt < 0.05).

Please see Dudbridge 2013, PLoS Genetics, for theoretical results demonstrating this (eg. see Figures 1 and 2, which show the optimum P-value threshold in different scenarios).

Reply all

Reply to author

Forward