Thank you Hae for your prompt reply. The file we used is downloaded from one of the open sources (https://www.ebi.ac.uk/gwas/downloads/summary-statistics) since the original ukbiobank's data does not include any generated GWAS summary statistics. We are actually targeting the liver-based models, and GWAS summary data. We tried to run this with the provided example model as well, but it does not give any result. We have attached the log when trying to use the GWAS downloaded file using the source mentioned and using the Liver model from https://zenodo.org/records/3518299 .
Another important question is how can we prepare ukbiobank's data for S-PrediXcan?
Any help in this regard would be highly appreciated.
Here are the details that you have asked for.
head:
chromosome variant_id base_pair_location effect_allele other_allele effect_allele_frequency beta standard_error p_value
1 rs146836579 87647 C T 0.00215931 0.0429659 0.327721 0.895692
1 rs7545609 90051 T C 0.00216346 0.0435438 0.327723 0.894298
1 rs546872994 136113 T C 0.00121655 -0.700075 0.431632 0.104819
1 NA 267404 T TATA 0.00414028 -0.174513 0.237157 0.46182
1 rs554909596 458823 TA T 0.0026455 -0.0147158 0.290358 0.959579
1 rs28863004 526736 G C 0.00291971 -0.683316 0.284223 0.01621
1 rs557203750 559985 T C 0.00548926 -0.00460037 0.20235 0.981862
1 rs564040090 562147 A T 0.0125908 -0.0630268 0.136551 0.644396
1 rs569899510 563812 T G 0.00240385 0.114382 0.305065 0.707704
command: cd /opt/notebooks/my_work/MetaXcan/software ./SPrediXcan.py \ --model_db_path /opt/notebooks/my_work/eqtl/mashr/mashr_Liver.db \ --covariance /opt/notebooks/my_work/eqtl/mashr/mashr_Liver.txt.gz \ --gwas_file /opt/notebooks/ukb_data/GCST90103908_processed.tsv \ --snp_column SNP \ --effect_allele_column effect_allele \ --non_effect_allele_column non_effect_allele \ --beta_column beta \ --pvalue_column pvalue \ --keep_non_rsid \ --model_db_snp_key varID \ --output_file /opt/notebooks/my_work/results/linoleic_acid_spredixcan.csv
output: WARNING - Missing --gwas_h2 and --gwas_N are required to calibrate the pvalue and zscore. INFO - Processing GWAS command line parameters INFO - Building beta for /opt/notebooks/ukb_data/GCST90103908_processed.tsv and /opt/notebooks/my_work/eqtl/mashr/mashr_Liver.db INFO - Reading input gwas with special handling: /opt/notebooks/ukb_data/GCST90103908_processed.tsv INFO - Processing input gwas INFO - Aligning GWAS to models /opt/notebooks/my_work/MetaXcan/software/metax/misc/GWASAndModels.py:15: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. alleles_1 = pandas.Series([set(e) for e in zip(merged[EA], merged[NEA])]) /opt/notebooks/my_work/MetaXcan/software/metax/misc/GWASAndModels.py:16: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning. alleles_2 = pandas.Series([set(e) for e in zip(merged[EA_BASE], merged[NEA_BASE])]) INFO - Trimming output INFO - Successfully parsed input gwas in 3.0287915779990726 seconds INFO - Started metaxcan process INFO - Loading model from: /opt/notebooks/my_work/eqtl/mashr/mashr_Liver.db INFO - Loading covariance data from: /opt/notebooks/my_work/eqtl/mashr/mashr_Liver.txt.gz INFO - Processing loaded gwas INFO - Started metaxcan association INFO - 0 % of model's snps used WARNING - IMPORTANT: The pvalue and zscore are uncalibrated for inflation INFO - Sucessfully processed metaxcan association in 2.3774969240002974 seconds
Best,
FaisalOn Thu, Jun 5, 2025 at 7:39 PM Hae Kyung Im <ha...@uchicago.edu> wrote:Hi Faisal,those problems have to do with mismatch between SNP names in the model vs the genotype files. PrediXcan and TWAS in general are used with common variants, so there is no benefit in using WGS vs the imputed genotype data in the UK Biobank. Please send the head of your genotype files and the exact command you are using to PrediXcan/MetaXcan <predixca...@googlegroups.com>.HakyOn Thu, Jun 5, 2025 at 2:31 AM Faisal Imran <fimran.ms...@seecs.edu.pk> wrote:Dear Haky,
I hope you are doing well. We are collaborating with Theranostics Laboratory on generating liver transcriptomes using UK Biobank data and have been attempting to run S-PrediXcan for this project. Although the example scripts in the repository execute successfully, we encounter errors when applying the provided model to the UKBB whole-genome sequencing (WGS) data. Additionally, when we use the GTEx V8 models from PredictDB, we observe 0 % SNP coverage.
We would greatly appreciate any guidance you can offer to help us integrate S-PrediXcan with UK Biobank WGS. We have also explored WGS data from sources such as EBI but have not been successful.
Thank you for your time and assistance.
Best,
Faisal
Hi Faisal,
I have seen a few issues with your code;
I think if you set these correctly it should work well.