We are having issues with low R2 (coefficient of determination) Predictions from the Elastic net PredictDB model implemented in the R script gtex_v7_nested_enet.R. Our predicted performance R2 from elastic net is also very low in the model summary output file. We are using gene expression data from blood tissue from the MIDUS project (
https://midus.colectica.org/) in which the data comes from 2 cohorts. Both data cohorts (sample sizes 396 and 397) represent log2-transformed transcript abundance values (gene transcript counts per million total human transcriptome-mapped RNA sequencing reads) normalized to hold constant 11 standard reference genes and floored at 1 transcript per million (0 log2) to suppress spurious variability. We used limma::normalizebetweenArrays and sva::Combat to account for batch effects and then inverse-normal transformed the GE values across both cohorts. We included age, sex, cohort, first 3 PCs based on SNPs and first 41 PCs from the merged (across both cohorts; sample size 793) gene expression data (in lieu of PEER factors) as covariates.
We ran your elastic net model on the default settings and obtained weights for the final models and ran Xbeta_hat to get GREx. We then obtained R2. Our mean and variance of imputed GREx are both close to 0. We also estimated heritability using GCTA (REML) and get a mean h2 of 0.0154 (max h2 = 0.352) for 3065 genes (still have 4500+ running). Do you have any suggestions on what may be our issue / what we are doing wrong?