Regarding these issues, there are a few questions I had about methods and performance, and I thought I would include more information about my task.
I have 5 traits, the SNP numbers range from about 10 million to 13.8 million across these traits. I've run munge, ldsc, and sumstats preparation with these traits together with no issues. As far as I can tell, the output from LDSC looks fine.
I've tried running a userGWAS with both a single factor and 2-factor model on a Linux HPC cluster. Initially, the single factor timed out after 72 hours with 24 cores. To try and improve runtime performance I tried running separate jobs for each chromosome, where I use the full-genome LDSC results and subset my full-genome sumstats object for the current chromosome as follows:
sumstats_chr <- sumstats[
sumstats
$CHR == chr, ]
# Run userGWAS on the chromosome-specific data
gwas_output <- userGWAS(
covstruc = LDSCoutput, # LDSC data loaded here
estimation = "DWLS",
SNPs =
sumstats_chr
,
model = model_snp,
parallel = FALSE
)
This method lead to my above errors with infinite or missing values in V_full.
My most recent effort was returning to the full-genome analysis and implementing the Linux performance fix that is noted in the documentation:
export OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 NUMEXPR_NUM_THREADS=1 VECLIB_MAXIMUM_THREADS=1
~/anaconda3/envs/r-env/bin/R --no-echo --no-restore --file=gwas_template_full.R
These have been running overnight - I'm curious what the expected runtime would be for my 5 traits with ~10 million SNPs and 22 cores?
Best,
John