How to Run PRSice from MACOSX

290 views
Skip to first unread message

su96...@gmail.com

unread,
Aug 6, 2017, 7:27:20 AM8/6/17
to PRSice
Hi everybody,

Completely new in PRSice. Want  to run it in MacOSX. I have installed R in MAC OSX ElCapitan 10.11.5. I have entered the following lines in R Console.

> library(fmsb)
> library(batch)
> library(gtx)
Loading required package: survival
> library(plyr)
> library(ggplot2)


Setup the working directory and copied the files in the working directory.

> getwd()
[1] "/Users/user/PR"
> dir()
 [1] "plink_1.9_linux_160914"        "plink_1.9_mac_160914"         
 [3] "PRSice_MANUAL_v1.25.pdf"       "PRSice_v1.25.R"               
 [5] "PRSice_VIGNETTE_v1.23.pdf"     "PRSice_VIGNETTE_v1.25.pdf"    
 [7] "TOY_BASE_GWAS.assoc"           "TOY_TARGET_DATA.bed"          
 [9] "TOY_TARGET_DATA.bim"           "TOY_TARGET_DATA.fam"          
[11] "TOY_TARGET_GWAS.assoc"         "TOY_TARGET_QUANTITATIVE.pheno"

If I run the command head TOY_BASE_GWAS.assoc

I get an error message.

Error: unexpected symbol in "head TOY_BASE_GWAS.assoc"


Please give some help.

Charles Perrier

unread,
Aug 7, 2017, 5:26:13 AM8/7/17
to PRSice
I also had some troubles on my mac. Not sure if it was the same problem.
I installed Docker and I ran the dockerised PRSice version. 
It works great. 

Sam Choi

unread,
Aug 7, 2017, 6:14:50 AM8/7/17
to PRSice
For this specific problem, that's because you have not read in the file correctly as you are using bash script

In R, to read the file, you will first need to assign the file to a variable, i.e

sumstat <- read.table("TOY_BASE_GWAS.assoc", header=T)


then you can read it through head

head(sumstat)

Note that the () as this is a R script.

Now all those are actually R specific and has nothing to do with PRSice. To use PRSice, you should run

R -q --file=./PRSice_v1.25.R --args \
 
base TOY_BASE_GWAS.assoc \
 target TOY_TARGET_DATA
\
 slower
0 \
 supper
0.5 \
 sinc
0.01 \
 covary F
\
 clump
.snps F \
 plink
./plink_1.9_mac_160914 \
 fig name EXAMPLE_1

On the terminal
instead.

On a side note, please make sure it is "\" instead of "\ " The space will make all the difference.




su96...@gmail.com

unread,
Aug 7, 2017, 9:45:25 AM8/7/17
to PRSice
Hi Sam Choi,

Thanks for your reply. It has worked.
1. I have entered the following in R Console.
> library(fmsb)
> library(batch)
> library(gtx)
Loading required package: survival
> library(plyr)
> library(ggplot2)


2. Then  I have run the code from mac console. Not from R Console.

R -q --file=./PRSice_v1.25.R --args base TOY_BASE_GWAS.assoc target TOY_TARGET_DATA slower 0 supper 0.5 sinc 0.01 covary F clump.snps F plink ./plink_1.9_mac_160914


I have typed everything in one line. 


It worked. No error messages. 


I can not find the output files in working directory. That is the problem now. 


su96...@gmail.com

unread,
Aug 7, 2017, 9:49:23 AM8/7/17
to PRSice
I have found the output files now in My all files.

su96...@gmail.com

unread,
Aug 7, 2017, 9:58:15 AM8/7/17
to PRSice
Sam Choi,

I understand the base file: TOY_BASE_GWAS.assoc

But do not understand the target files:
TOY_TARGET_DATA.bed
TOY_TARGET_DATA.bim
TOY_TARGET_DATA.fam

How to I create these three files. 

How do the target file and base file relate?

Sam Choi

unread,
Aug 7, 2017, 5:44:34 PM8/7/17
to PRSice
The Base file should be the summary statistic file, usually generated from GWAS. 

The target files are your genotype. They are usually in plink format. You can find the details here:


Message has been deleted

su96...@gmail.com

unread,
Aug 11, 2017, 11:18:23 AM8/11/17
to PRSice
I have prepared base data and target data. Is it possible to calculate polygenic risk score with this data?
BaseData.xlsx
Target Data.xlsx

Sam Choi

unread,
Aug 12, 2017, 5:08:26 PM8/12/17
to PRSice
First, most PRS program will not accept an excel input. You can copy the data to a text file. 

For base file, you need the SNP name, the reference allele, the alternative allele, the odds ratio / beta and the p-value

For the target file, we usually require a genotype file in the plink format (please refer to the link I sent you above).

In the case where you don't have a genotype file, you can also use a file similar to the base file to perform a summary statistic based PRS analysis. 


Sam Choi

unread,
Aug 12, 2017, 5:08:46 PM8/12/17
to PRSice
(Simply put, you can't directly use those two files as an input)

su96...@gmail.com

unread,
Aug 13, 2017, 12:40:23 AM8/13/17
to PRSice
Hi Sam Choi,

Thanks for your help. I have managed to make necessary files by using PLINK command line. I created example files and run the program and it worked.

I have seen the program makes polygenic risk score text file. What is the meaning of those risk score figures?

Without your help I could not have done this far.

Sam Choi

unread,
Aug 13, 2017, 6:09:19 AM8/13/17
to PRSice
You mean the bar chart, the high resolution plot and the quantile plots? (usually the bar-chart is the only output depending on your parameters).

The bar chart visualize how well different threshold predicts your phenotype of interest. This allow users to "see" the results without going through the whole text file output. Usually, you will want a p-value < 1e-4 (please refer to PRSice paper for that) to avoid overfitting and multiple testing. A larger R2 will suggest a better prediction of the model. 

For more detail, I'd suggest you to read the PRSice paper and the original PRS paper: Common polygenic variation contributes to risk of schizophrenia and bipolar disorder

su96...@gmail.com

unread,
Aug 13, 2017, 5:56:08 PM8/13/17
to PRSice
Hi Sam Choi,

Thank you for your reply. I am a bit confused about P-values in sample data are like this:


SNP CHR BP A1 A2 P OR
SNP_22857 4 103593179 1 2 0.2852 13.29
SNP_13879 2 237416793 1 2 0.8784 21.624
SNP_20771 4 16957461 1 2 0.1994 91.265

Here P values are 0.02852, 0.08784, etc. which are very from my data.

In my data p values are like 2X10^-6, 1.5x10^-12, etc. which are very small. 

Not sure why so much different p values. I collected data from various studies. May be I am using wrong p values

Sam Choi

unread,
Aug 13, 2017, 7:17:06 PM8/13/17
to PRSice
Hi Su, 

It's strongly advice that you should read the original paper of PRS and also our PRSice paper.

To give an over simplified answer:

To perform PRS analysis, you need the results from a genome wide analysis study (GWAS), usually contains the information of P-value, SNP id, reference allele, alternative allele and also the effect size (either beta, or OR).

Some example of this resources includes the PGC

Sometimes, you can get away with only the significant SNPs (i.e. the SNPs with very small p-value) though originally PRS is designed to also include the signals from the insignificant SNPs (as they might also contain the true signals)

Unfortunately, if you don't understand what's a GWAS, then you'll have to read that up yourself as that'd take sometime to go through all the basics

Usually, we will use the results from one GWAS per PRS analysis or you can do multiple PRS analysis per each GWAS results but we rarely if ever try to combine the significant signals into one single GWAS (i.e. you input). If you must, you can conduct a meta analysis to obtain one single GWAS result. 



Reply all
Reply to author
Forward
0 new messages