calculate PRS by "score" faster

655 views
Skip to first unread message

Peng Yin

unread,
Jun 1, 2022, 2:53:44 AM6/1/22
to plink2-users
Dear plink community, 

I am calculating several large Polygenetic risk scores (each with million of SNPs) for UK biobank samples, and wonder is there a way to speed up the calculation using "plink -score"

I am currently using command: 
plink --bfile plink.bed bim fam --read-freq plink2.acount --score PGS.file -threads 36

This process seems not fully use CPU/memory? 

Many thanks! 
Best wishes,
Peng

Christopher Chang

unread,
Jun 1, 2022, 11:54:24 AM6/1/22
to plink2-users
0. I am generally able to help users more effectively when they post the actual command they ran, and preferably the .log file as well.  What you actually posted here is a syntax error; and because all I know is that you DIDN'T run that exact command, I am forced to guess what you actually did run, and in particular I don't know whether you ran plink 1.9 or 2.0, even though there's a "plink" at the beginning of the command line that ordinarily tells me that you ran 1.9.

1. Once you're dealing with UK Biobank-scale data, --pfile/--make-pgen will save you a substantial amount of I/O overhead relative to --bfile/--make-bed.  This requires plink 2.0.

2. plink 2.0 --score can compute multiple polygenic scores in a single run.  This is the main way to take advantage of multiple CPUs.

Christopher Chang

unread,
Jun 3, 2022, 12:59:35 AM6/3/22
to plink2-users
There's more room for improvement for your use case, but today's build should improve --score's parallelism.

On Tuesday, May 31, 2022 at 11:53:44 PM UTC-7 aaronp...@gmail.com wrote:

Raju Natha

unread,
Jun 7, 2022, 6:16:42 AM6/7/22
to plink2-users
Hello,

In my case, I have bed files and the ultimate goal is to find a Polygenic Risk Score. Can you please explain steps to find so?

Thanks & Regards,
Raju.

Gabriel Cerono

unread,
Jun 9, 2022, 12:09:44 PM6/9/22
to plink2-users
Hello. I am not the the original poster of this, but I am about to do the same thing. I am going to use plink2, how should I write my command line so that I can score many Polygenic risk score at the same time?

Christopher Chang

unread,
Jun 10, 2022, 12:48:09 PM6/10/22
to plink2-users
You would use the new --score-col-nums flag.  The "PCA projection with --score" section of that page includes a full example (since simple PCA projection can be framed as a type of polygenic score).

Note that this may require you to create an input file to contain a lot of '0' entries.  The June 3 build speeds up the single-score-at-a-time use case by enough that you may be fine without using the multi-scoring feature.  (If it's still too slow, I am planning to implement one more optimization soon.)
Reply all
Reply to author
Forward
0 new messages