PennPRS Question

3 views
Skip to first unread message

Schoenherr Sebastian

unread,
Dec 17, 2025, 7:59:05 AM12/17/25
to pen...@googlegroups.com
Hi everyone,
I tried running the test example using the example_EUR_binary.txt file, but it resulted in a failure. The downloaded log does not include any data. I would appreciate it if you could let me know what I might be doing wrong.

I was also wondering if it’s possible to upload my own file, using a format like CHR:POS:REF:ALT for the SNP column. 

Best,
Sebastian

Sebastian Schönherr, PhD  
Professor of Digital and Computational Genomics 
Institute of Genetic Epidemiology  
Medical University of Innsbruck  
------------------------------------  
Web  - https://genepi.i-med.ac.at/research/computational-genomics/
Mail  - sebastian....@i-med.ac.at
Tel  - +43 512 9003 70579

Jin, Jin

unread,
Dec 17, 2025, 9:00:35 PM12/17/25
to Schoenherr Sebastian, pen...@googlegroups.com

Dear Sebastian,

 

Thank you for reporting this issue to our team! This issue is likely due to our recent platform update last week. We will let you know once we resolve this issue. Thank you for your patience!

 

Best regards,

Jin

 

From: pen...@googlegroups.com <pen...@googlegroups.com> on behalf of Schoenherr Sebastian <sebastian....@i-med.ac.at>
Date: Wednesday, December 17, 2025 at 7:59
AM
To: pen...@googlegroups.com <pen...@googlegroups.com>
Subject: [External] PennPRS Question

You don't often get email from sebastian....@i-med.ac.at. Learn why this is important

--
您收到此邮件是因为您订阅了Google群组上的“PennPRS”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到pennprs+u...@googlegroups.com
如需查看此讨论,请访问 https://groups.google.com/d/msgid/pennprs/D2BAA904-F393-4813-A859-B30936F1BA99%40i-med.ac.at
要查看更多选项,请访问https://groups.google.com/d/optout

Jin, Jin

unread,
Dec 17, 2025, 11:10:18 PM12/17/25
to Schoenherr Sebastian, pen...@googlegroups.com

Dear Sebastian,

 

Our team has resolved the issue you reported. Please try again and let us know if you encounter additional issues.

 

Thanks!

Jin

 

Schoenherr Sebastian

unread,
Dec 30, 2025, 2:19:11 PM12/30/25
to Jin, Jin, pen...@googlegroups.com
Dear Jin,

Thanks! The error seems to be fixed now. Unfortunately, I’m now running into the following error. Any advice?

Best,
Sebastian

################## PennPRS version 1.0.0 ##################

###########################################################
##################### Loading Data ########################
###########################################################
Processing file: output.tsv.gz
Unzipping gzipped file.
Column configuration is correct.
No column contains all NA values.
Deleting temporary unzipped file.
The /home/ubuntu/data/working/sebastian....@i-med.ac.at/data/AFR_output.tsv.txt file has 420 rows.
Loading required package: bigstatsr

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, first, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Attaching package: ‘scales’

The following object is masked from ‘package:readr’:

    col_factor

$methods
[1] "lassosum2-pseudo"

$trait
[1] "output.tsv"

$ancestry
[1] "AFR"

$LDrefpanel
[1] "1kg"

$k
[1] 2

$partitions
[1] "0.8,0.2"

$delta
[1] "0.001,0.01,0.1,1.0"

$nlambda
[1] 30

$lambda.min.ratio
[1] 0.01

$alpha
[1] "0.7, 1.0, 1.4"

$p_seq
[1] "1e-05,3.2e-05,0.0001,0.00032,0.001,0.0032,0.01,0.032,0.1,0.32,1.0"

$sparse
[1] "FALSE"

$kb
[1] 500

$Pvalthr
[1] "5e-08,5e-07,5e-06,5e-05,0.0005,0.005,0.05,0.5"

$R2
[1] "0.1"

$ensemble
[1] FALSE


**************************************************************************
******************* Step 0: QC for the input GWAS data *******************
**************************************************************************
* QC step completed. 419 SNPs remaining. No SNP was removed.

**************************************************************************
**************************************************************************
******************* Step 1: PUMAS Subsampling: ***************************
*** 0.8 training, 0.2 tuning with 2-fold Monte Carlo cross-validation. ***
**************************************************************************
Loading required package: data.table
Loading required package: parallel
Loading required package: optparse
[1] "LD and rs files loaded."
[1] "Matching GWAS and LD data completed."
[1] "Subsampling completed for ite1."
[1] "Subsampling completed for ite2."

**************************************************************************
******************** Step 2: Train PRS models by MCCV ********************
**************************************************************************
366 variants to be matched.
0 ambiguous SNPs have been removed.
366 variants have been matched; 0 were flipped and 0 were reversed.
Preparing LD info for CHR:
1.. 2.. 3.. 4.. 5.. 6.. Error: object 'ld' not found
Execution halted
Error: Step 1 (multi_step1.R) failed with exit code 1.


Best
Sebastian

Sebastian Schönherr, PhD  
Professor of Digital and Computational Genomics 
Institute of Genetic Epidemiology  
Medical University of Innsbruck  
------------------------------------  
Web  - https://genepi.i-med.ac.at/research/computational-genomics/
Mail  - sebastian....@i-med.ac.at
Tel  - +43 512 9003 70579

On 18.12.2025, at 5:10 AM, Jin, Jin <Jin...@Pennmedicine.upenn.edu> wrote:

Jin, Jin

unread,
Dec 30, 2025, 11:53:27 PM12/30/25
to Schoenherr Sebastian, pen...@googlegroups.com

Hi Sebastian,

 

Thank you for reporting this issue! Our team has recently identified the same issue in another job, which is because the uploaded GWAS summary data file only contains SNPs from CHR 6 but not any other chromosome, while our code has a minor issue when handling such input datasets. We are currently working on updating our code and we will let you know as soon as our new code is online. Thank you again for your patience!

 

Best,

Jin

Jin, Jin

unread,
Dec 31, 2025, 12:31:50 AM12/31/25
to Schoenherr Sebastian, pen...@googlegroups.com

Hi Sebastian,

 

We have completed our code update. Please try to submit the job again and let us know if you encounter more issues. Please note that sometimes the user has to log out then log back in to use the latest version. Thank you for your patience!

 

Thanks!

Jin

 

Schoenherr Sebastian

unread,
Jan 14, 2026, 7:18:23 AM (11 days ago) Jan 14
to Jin, Jin, pen...@googlegroups.com
Hi,
The last updated worked well. Thanks for your help One more thing: I’m using REGENIE output, and had to transform LOG10P to P before. My latest job failed (lpa_test_14125). Is there a problem with this syntax? 1e-622.587 (LOG10P: 622.587). Or is this another issue?

Best.
Sebastian

Sebastian Schönherr, PhD  
Professor of Digital and Computational Genomics 
Institute of Genetic Epidemiology  
Medical University of Innsbruck  
------------------------------------  
Web  - https://genepi.i-med.ac.at/research/computational-genomics/
Mail  - sebastian....@i-med.ac.at
Tel  - +43 512 9003 70579

Jin, Jin

unread,
Jan 16, 2026, 10:10:28 AM (9 days ago) Jan 16
to Schoenherr Sebastian, pen...@googlegroups.com

Hi Sebatian,

 

Thanks for reporting the issue! Our team will look into this and get back to you.

 

Best,

Jin

 

Jin, Jin

unread,
Jan 16, 2026, 3:39:17 PM (9 days ago) Jan 16
to Schoenherr Sebastian, pen...@googlegroups.com

Hi Sebastian,

 

You are correct – our code does not support an input P-value in the format “1e-622.587”. I believe the reason the syntax is not accepted is because in scientific notation, the exponent should be an integer not a decimal (e.g., 622.587). As a result, our R code recognizes this column as a column of characters.

 

I was wondering if there are some issues with columns “log10P” and “P” in the input data file. I checked one row corresponding to SNP rs56029582, where LOG10P column shows 2.71872e+00 and P column shows 1e-2.71872. I think the LOG10P column should be -LOG10P = 2.71872, i.e., LOG10P = 2.71872, and the P column should be 10^(-2.71872), not 1e-2.71872, which is not a format R can recognize.

 

Could you please try correcting the P-value column and rerunning the pipeline? I believe it should work then.

 

Thank you for reporting this issue due to the need for transforming the LOG10P output from REGENIE to p-values. We will discuss about the need to update our pipeline to account for this alternative input format and get back to you.

 

Thanks, and have a nice weekend!

 

Best regards,

Jin

 

Schoenherr Sebastian

unread,
Jan 17, 2026, 5:34:36 AM (8 days ago) Jan 17
to Jin, Jin, pen...@googlegroups.com
HI Jin,
Thanks for your help. 

I think there are several issues:
-  I format like 10^(-63.3054) can not be read in by R and use your asNumeric method. I tried that by running past of your pipeline in R and I get "Warning: NAs introduced by coercion"
- A log10P value of 622.587 to P conversion exceeds Rs precision limit. I think you need to support LOG10P.

Maybe Im missing something.

Best.
Sebastian


Sebastian Schönherr, PhD  
Professor of Digital and Computational Genomics 
Institute of Genetic Epidemiology  
Medical University of Innsbruck  
------------------------------------  
Web  - https://genepi.i-med.ac.at/research/computational-genomics/
Mail  - sebastian....@i-med.ac.at
Tel  - +43 512 9003 70579

Jin, Jin

unread,
Jan 18, 2026, 10:13:59 PM (6 days ago) Jan 18
to Schoenherr Sebastian, pen...@googlegroups.com

Hi Sebastian,

 

Thanks for the update. Can you try the R code below which converted the LOG10P column into P? This should solve the problem in your data. Can you let me know how you got the “P” column? The as.numeric function in R cannot be used on the P column because the values in the current P column seem to be incorrect.

 

input_gwas_table = bigreadr::fread2(' output.tsv')

input_gwas_table$P = 10^(-input_gwas_table$LOG10P)

Schoenherr Sebastian

unread,
Jan 19, 2026, 2:29:06 AM (6 days ago) Jan 19
to Jin, Jin, pen...@googlegroups.com
Hi Jin, 
My code was almost identical. Again, if LOG10P is too large, the precision of R is not sufficient. See the screenshot executed your code (P1, which is the p value in this case gets to zero).

Best.
Seb

PastedGraphic-1.png


Best.
Sebastian


Sebastian Schönherr, PhD  
Professor of Digital and Computational Genomics 
Institute of Genetic Epidemiology  
Medical University of Innsbruck  
------------------------------------  
Web  - https://genepi.i-med.ac.at/research/computational-genomics/
Mail  - sebastian....@i-med.ac.at
Tel  - +43 512 9003 70579
Reply all
Reply to author
Forward
0 new messages