Fast-score PRS analyses with UKBB as target taking very long

Victória Trindade

unread,

Nov 9, 2020, 11:42:14 AM11/9/20

to PRSice

Hi,

I ran fast-score PRS analyses on Cartesius (Dutch national supercomputer), using the UKBB as target dataset (bgen).

Last time it took almost 3 days (68 hours) until I got the results. I believe this is taking too long, given that I have seen reports in this group of it taking way less time with UKBB data... Most of the time is in the stage where it says "Calculating allele frequencies" in the log.

Thus, I would like to know if someone that has tried this and if there is anything to be done to improve this.

Here follows the script I have used:

       --prsice ./PRSice_linux \
       --base basefile \
       --target ukbb_file,ukbb.sample \
       --type bgen \
       --binary-target T \
       --pheno-col bin \
       --ignore-fid \
       --maf 0.01 \
       --geno 0.05 \
       --memory 60Gb \
       --cov-col PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,age,sex,site,array \
       --cov-factor site,array \
       --allow-inter \
       --quantile 5 \
       --out UKBB_output \
       --pheno ukbb_phenocovfile_new.txt \
       --cov ukbb_phenocovfile_new.txt \
       --fastscore \
       --extract UKBB_output.valid

(I used --memory to limit to 60GB because I had the job be killed during clumping because of memory issues.)

Does anyone have any tips to improve this? That would help me a lot.

Thanks in advance,

Victoria

Sam Choi

unread,

Nov 9, 2020, 5:37:00 PM11/9/20

to PRSice

BGEN tends to run for much longer and I don't think I have optimize the speed for maf filtering yet. If you did the MAF outside of PRSice (or just use the MAF file provided by UKB and use that with --extract / --exclude), that should speed things up substantially.

Javad

unread,

Nov 11, 2020, 9:20:42 AM11/11/20

to PRSice

Hi Sam,

I am also trying to do the polygenic scoring using the UKB bgen files as the target. But I get this error after reading the bgen files :

...

..

Start performing clumping

terminate called recursively

terminate called after throwing an instance of 'std::runtime_error'

what(): Error: Cannot read the bgen file!

Error:

Execution halted

here is my script:

./PRSice_linux \

--a1 A1 \

--a2 A2 \

--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \

--base base\

--beta \

--binary-target F \

--clump-kb 250kb \

--clump-p 1.000000 \

--clump-r2 0.100000 \

--cov ./cov_prsice \

--extract wbgen.valid \

--interval 5e-05 \

--keep cohort.list \

--lower 5e-08 \

--num-auto 22 \

--out sample/wbgen \

--pheno pheno \

--pvalue P \

--seed 2919319671 \

--snp SNP \

--stat BETA \

--target ukb_imp_chr#_v3,imp_chr2_v3_s4222.sample \

--thread 48 \

--type bgen \

--upper 0.5

Also, I realized that the multithread doesn't work and the process is very slow.

Do you know what's the problem?

Regards,

Javad

Sam Choi

unread,

Nov 11, 2020, 5:16:16 PM11/11/20

to PRSice

Can you confirm the bgen file is located in the current directory?

And which version are you using? Multi-threaded clumping is only available since 2.3.3 and besides clumping and permutation, multi-threading wasn't used anywhere else.

Javad

unread,

Nov 11, 2020, 8:30:03 PM11/11/20

to PRSice

Hi Sam,

Yes.

Of course, the bgen files are not in the current directory, I just changed the log.

Anyway, the error actually occurs AFTER reading the bgen files, here is the complete log

PRSice 2.3.3 (2020-08-05)

https://github.com/choishingwan/PRSice

GNU General Public License v3

If you use PRSice in any published work, please cite:

Choi SW, O'Reilly PF.

PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data.

GigaScience 8, no. 7 (July 1, 2019)

2020-11-11 22:03:49

./PRSice_linux \

--a1 A1 \

--a2 A2 \

--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5,1 \

--base /scratch/wbnojobstat.clean.forprscs \

--beta \

--binary-target F \

--clump-kb 250kb \

--clump-p 1.000000 \

--clump-r2 0.100000 \

--cov ./braincohort_sample/cov_brainprscohort_prsice \

--extract ./braincohort_sample/wbgen.valid \

--interval 5e-05 \

--keep ./braincohort_sample/prs.brain.cohort.list \

--lower 5e-08 \

--num-auto 22 \

--out braincohort_sample/wbgen \

--pheno ./braincohort_sample/pheno_only_wbnj2 \

--pvalue P \

--seed 2919319671 \

--snp SNP \

--stat BETA \

--target /UKBB_GeneticData/bgen_files/ukb_imp_chr#_v3,/UKB/sam_files/ukb_imp_chr2_v3_s222.sample \

--thread 48 \

--type bgen \

--upper 0.5

Initializing Genotype file:

/UKBB_GeneticData/bgen_files/ukb_imp_chr#_v3

(bgen)

With external fam file:

/UKB/sam_files/ukb_imp_chr2_v3_s222.sample

Start processing wbnojobstat.clean

==================================================

SNP extraction/exclusion list contains 5 columns, will

assume first column contains the SNP ID

Base file:

/scratch/wbnojobstat.clean.forprscs

Header of file is:

SNP A1 A2 BETA P

8103178 variant(s) observed in base file, with:

44875 variant(s) excluded based on user input

8058303 total variant(s) included from base file

Loading Genotype info from target

==================================================

487409 people (222994 male(s), 264301 female(s)) observed

15213 founder(s) included

7565465 SNPs processed in /UKBB_GeneticData/bgen_files/ukb_imp_chr1_v3.bgen

.

2565465 SNPs processed in /UKBB_GeneticData/bgen_files/ukb_imp_chr22_v3.bgen

85027610 variant(s) not found in previous data

9710 variant(s) with mismatch information

8058303 variant(s) included

Phenotype file: ./braincohort_sample/pheno_only_wbnj2

Column Name of Sample ID: FID+IID

Note: If the phenotype file does not contain a header, the

column name will be displayed as the Sample ID which is

expected.

There are a total of 1 phenotype to process

Start performing clumping

terminate called recursively

terminate called after throwing an instance of 'std::runtime_error'

what(): Error: Cannot read the bgen file!

Error:

Execution halted

Sam Choi

unread,

Nov 12, 2020, 5:54:30 PM11/12/20

to PRSice

If you try and use `--allow-inter`, does that solve the problem?

It's a bit weird to have this error. Will have to check when I have time to go back to PRSice's code base.

Sam

Javad

unread,

Nov 15, 2020, 5:51:37 PM11/15/20

to PRSice

Thanks Sam.

Adding "--allow-inter" fix the problem, however, it makes a huge intermediate file, and also seems like it will take a lot of time.

Using 48 cores I set the wall time to 15 hours but the analysis was not finished.

I think converting the BGEN files to plink files, using a set of SNPs from the base ( summary stat) file (--extract), and also keep the people we want to analyze and then inputting the plink files would be easier and faster than using the BGEN files as precise input target files.

What do you think Sam?

Sam Choi

unread,

Nov 16, 2020, 5:28:26 PM11/16/20

to PRSice

That'd be the fastest method. Although if you really want the dosage score, just provide PRSice with an LD reference file. That will substantially speed things up. For dosage scores, PRSice first build the intermediate file for clumping, then use the original bgen file for dosage PRS calculation. By supplying a PLINK formatted LD file, you skip the intermediate building, thus speed up the clumping.

Sam

Javad

unread,

Nov 18, 2020, 1:02:48 AM11/18/20

to PRSice

Great!

Thanks Sam.

Javad

Slot

unread,

Dec 6, 2023, 10:15:34 AM12/6/23

to PRSice

Fast-score PRS analyses with UKBB as target taking very long

📌📌📌📌📌📌📌📌📌 ⚡️📞ติดต่อเรา📞⚡️ LINE: @the88thai https://cutt.ly/LiCMCLSth 📲 ทางเข้า THE88 📲 https://cutt.ly/HoCMCLSth 💸 สมัครสมาชิก 💸 https://cutt.ly/ReCLSthe 🔹🔹🔹🔹🔹🔹🔹🔹🔹🔹 💎 THE88 💎 🎰 สมาชิกใหม่ฝากครั้งแรก สล็อต 🎰 👉 รับโบนัส 100% เพียงเทิร์น 3 เท่า 💯 📌 ฝาก 50 รับ 100 📌 ฝาก 100 รับ 200 📌 ฝาก 300 รับ 600 ------------------------ 👉 คืนยอดเสีย...เล่นเสียเราคืนเงินให้ 👈 🟠 ถอนได้ทันที...ไม่มีเทิร์น!!! 🟠 คืนเงิน "สล็อต ยิงปลา กีฬา คาสิโน" ทุกชั่วโมง 1% 🟠 คืนยอดเสีย รายสัปดาห์สูง 18,888 บาท!!!!

สล็อต ยืนยัน otp รับเครดิตฟรี100 สล็อต เว็บตรง ขั้นต่ำ 1 บาท สล็อต เว็บตรง ยุโรป สล็อต เว็บใหญ่ pg สล็อต 777 เว็บตรง สล็อต 888 สล็อต เครดิต ฟรี แค่ สมัคร สล็อต เครดิตฟรี 50 ไม่ต้องฝากก่อน ไม่ต้องแชร์ ยืนยันเบอร์โทรศัพท์2565 สล็อต เว็บ ใหญ่ สล็อต เว็บตรง ไม่ผ่านเอเย่นต์ ฝากถอน ไม่มี ขั้นต่ำ ซุปเปอร์ สล็อต ซุปเปอร์ สล็อต 999 เครดิตฟรี ด รา ก้อน สล็อต สล็อต pg เว็บตรง ไม่ผ่านเอเย่นต์ สล็อต ทั้งหมด สล็อต ยืนยัน otp รับเครดิตฟรี ไม่ต้องฝากเงิน ไม่ต้องแชร์ สล็อต เครดิตฟรี 100 ไม่ต้องฝากก่อน ไม่ต้องแชร์ ยืนยันเบอร์โทรศัพท์ สล็อต เว็บตรงไม่ผ่านเอเย่นต์ วอ เลท เกม สล็อต ทดลอง เล่น ฟรี แจกโค้ด เครดิตฟรี สล็อต ล่าสุด สล็อต 789 สล็อต 888 เว็บตรง ไม่ผ่านเอเย่นต์ ไม่มี ขั้นต่ำ สล็อต lumbo888 สล็อต pg เว็บตรงไม่ผ่านเอเย่นต์ไม่มีขั้น ต่ํา สล็อต ยืนยัน ตัว ต้น รับเครดิตฟรี สล็อต ยืนยันเบอร์โทร รับเครดิตฟรี ล่าสุด สล็อต ยืนยันเบอร์โทร รับเครดิตฟรี ล่าสุด ฟรี2021 สล็อต วอลเล็ต ไม่มีขั้นต่ํา สล็อต เครดิตฟรี สล็อต โอน ผ่าน วอ เลท ไม่มีขั้น ต่ํา 2021

ในวันที่ วันพุธที่ 18 พฤศจิกายน ค.ศ. 2020 เวลา 13 นาฬิกา 02 นาที 48 วินาที UTC+7 Javad เขียนว่า:

Reply all

Reply to author

Forward