error at p_sumstats stage (doesn't recognize Beta column)

349 views
Skip to first unread message

Charleen Adams

unread,
Nov 4, 2021, 9:19:43 PM11/4/21
to Genomic SEM Users
Hello, 

I'm trying to work my way through the tutorial. I'm getting an error at the  "p_sumstats" stage. It reads: 

Error in round(median(files[[i]]$effect, na.rm = T)) : non-numeric argument to mathematical function In addition: Warning message: In sumstats(files, ref, trait.names, se.logit, info.filter, maf.filter, : Cannot find beta or effect column, try renaming it effect in the summary statistics file for:CP

I've tried changing the names of the summary statistics files for the Beta variable to "effect", but it doesn't make any difference. I've checked that the column is numeric (not sure if that matters).  

What am I doing wrong? 

Thanks, 
Charleen 


Code I tried: 
--
munge("GWAS_CP_all.txt",
      "w_hm3.snplist",
      trait.names="CP",
      N=257828,
      info.filter = 0.9,
      maf.filter = 0.01)

munge("GWAS_EA_excl23andMe.txt",
      "w_hm3.snplist",
      trait.names="EA",
      N=766345,
      info.filter = 0.9,
      maf.filter = 0.01
)

traits <- c("CP.sumstats.gz","EA.sumstats.gz")
sample.prev <- c(NA,NA)
population.prev <- c(NA,NA)
ld<-"eur_w_ld_chr/"
wld <- "eur_w_ld_chr/"
trait.names<-c("CP","EA")

LDSCoutput <- ldsc(traits, 
                   sample.prev, 
                   population.prev, 
                   ld, 
                   wld, 
                   trait.names)
save(LDSCoutput, file="LDSCoutputCogNonCog.RData")

load(file="LDSCoutputCogNonCog.RData")
model<-'C=~NA*EA + start(0.4)*CP
        NC=~NA*EA
         
         NC~~1*NC
         C~~1*C
         C~~0*NC

         CP ~~ 0*EA
         CP~~0*CP
         EA~~0*EA'
output<-usermodel(LDSCoutput,estimation="DWLS",model=model)
save(output, file="Modeloutput.Rdata")

head -1000  GWAS_CP_all.txt > CP_1000.txt
head -1000  GWAS_EA_excl23andMe.txt > EA_1000.txt

files = c("CP_1000.txt", "EA_1000.txt")
ref = "reference.1000G.maf.0.005.txt"
trait.names = c("CP","EA")
se.logit = c(F,F)
info.filter = 0.6
maf.filter = 0.01

p_sumstats<-sumstats(files, ref, trait.names, se.logit, info.filter, maf.filter, OLS=c(T,T), linprob=NULL, N=c(257828,766345)) #prop=NULL, removed as it didn't run

agro...@gmail.com

unread,
Nov 5, 2021, 12:07:11 PM11/5/21
to Genomic SEM Users
Hi Charleen, 

It looks like you are doing a test run on the first 1,000 SNPs, and I'm wondering if by chance those SNPs aren't in the reference file and so you are getting an error message. The .log file produced by sumstats should help diagnose the issue if you are able to paste that here. 

Best, 
  Andrew

Charleen Adams

unread,
Nov 5, 2021, 4:13:55 PM11/5/21
to Genomic SEM Users

Thank you, Andrew. I appreciate the help.  I’ve attached the log files. 

Instead of trying to run it on 1000 SNPs, I tried using the entire files. It still didn’t recognize the Beta column for GWAS_CP_all. I modified it to be “effect”, and that still didn’t work. 

Here is the error:

Error in round(median(files[[i]]$effect, na.rm = T)) : 

  non-numeric argument to mathematical function

In addition: Warning messages:

1: In for (i in (1L:cols)[do]) { :

  closing unused connection 3 (CP_EA_sumstats.log)

2: In sumstats(files, ref, trait.names, se.logit, info.filter, maf.filter,  :

  Cannot find beta or effect column, try renaming it effect in the summary statistics file for:CP

 

I also tried removing unnecessary columns from the GWAS_CP_all file. Do the columns need to be in a particular order? 

 'data.frame':    10098325 obs. of  7 variables:

 $ MarkerName: chr  "rs2352974" "rs2271960" "rs2271961" "rs2247036" ...

 $ A1        : chr  "T" "C" "C" "C" ...

 $ A2        : chr  "C" "T" "T" "T" ...

 $ SE        : num  0.00285 0.00285 0.00285 0.00285 0.00285 0.00285 0.00285 0.00285 0.00285 0.00285 ...

 $ Pval      : num  5.19e-29 3.19e-28 3.23e-28 3.40e-28 6.81e-28 ...

 $ effect    : num  -0.0319 -0.0314 -0.0314 0.0314 -0.0312 ...

 $ EAF       : num  0.5 0.507 0.507 0.493 0.508 ...

 

I also get errors when running the munge step. Not sure if it matters. I attached the log file for CP_munge

Warning message:In for (i in (1L:cols)[do]) { : closing unused connection 4 (CP_munge.log) Warning message:In for (i in seq_len(n)) { :  closing unused connection 3 (CP_EA_sumstats.log)

 Best, 

Charleen


CP_EA_sumstats.log
CP_munge.log
CP.sumstats.gz_EA.sumstats.gz_ldsc.log

agro...@gmail.com

unread,
Nov 5, 2021, 6:27:09 PM11/5/21
to Genomic SEM Users
Hi Charleen, 

I recently put in an update to sumstats and something is breaking down, but I'm unable to reproduce this error on my end to troubleshoot. If it's not too much trouble would you be willing to e-mail me (Andrew.G...@colorado.edu) the 1,000 SNP version of the CP file so I can run line by line and see where the function is breaking down? 

Thanks!
 -Andrew

Charleen Adams

unread,
Nov 7, 2021, 8:55:43 PM11/7/21
to Genomic SEM Users
Thank you, Andrew. I emailed you from (cda...@hsph.harvard.edu), in case it went to junk mail.

In the meantime, in case anyone else is getting errors like mine, I also get an error when I load the genomicSEM package, though I don't know if it is relevant. When I use `warnings()`, here is what I see: 

Warning messages: 1: replacing previous import ‘gdata::nobs’ by ‘lavaan::nobs’ when loading ‘GenomicSEM’ 2: replacing previous import ‘gdata::last’ by ‘data.table::last’ when loading ‘GenomicSEM’ 3: replacing previous import ‘gdata::first’ by ‘data.table::first’ when loading ‘GenomicSEM’ 4: replacing previous import ‘gdata::env’ by ‘R.utils::env’ when loading ‘GenomicSEM’ 5: replacing previous import ‘gdata::resample’ by ‘R.utils::resample’ when loading ‘GenomicSEM’ 6: replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’ 7: replacing previous import ‘data.table::last’ by ‘dplyr::last’ when loading ‘GenomicSEM’ 8: replacing previous import ‘plyr::summarize’ by ‘dplyr::summarize’ when loading ‘GenomicSEM’ 9: replacing previous import ‘plyr::mutate’ by ‘dplyr::mutate’ when loading ‘GenomicSEM’ 10: replacing previous import ‘plyr::id’ by ‘dplyr::id’ when loading ‘GenomicSEM’ 11: replacing previous import ‘plyr::arrange’ by ‘dplyr::arrange’ when loading ‘GenomicSEM’ 12: replacing previous import ‘plyr::summarise’ by ‘dplyr::summarise’ when loading ‘GenomicSEM’ 13: replacing previous import ‘data.table::first’ by ‘dplyr::first’ when loading ‘GenomicSEM’ 14: replacing previous import ‘plyr::rename’ by ‘dplyr::rename’ when loading ‘GenomicSEM’ 15: replacing previous import ‘plyr::desc’ by ‘dplyr::desc’ when loading ‘GenomicSEM’ 16: replacing previous import ‘plyr::count’ by ‘dplyr::count’ when loading ‘GenomicSEM’ 17: replacing previous import ‘gdata::combine’ by ‘dplyr::combine’ when loading ‘GenomicSEM’ 18: replacing previous import ‘data.table::between’ by ‘dplyr::between’ when loading ‘GenomicSEM’ 19: replacing previous import ‘plyr::failwith’ by ‘dplyr::failwith’ when loading ‘GenomicSEM’ 20: replacing previous import ‘Rcpp::prompt’ by ‘utils::prompt’ when loading ‘GenomicSEM’ 21: replacing previous import ‘Rcpp::.DollarNames’ by ‘utils::.DollarNames’ when loading ‘GenomicSEM’ 22: replacing previous import ‘Matrix::tail’ by ‘utils::tail’ when loading ‘GenomicSEM’ 23: replacing previous import ‘gdata::object.size’ by ‘utils::object.size’ when loading ‘GenomicSEM’ 24: replacing previous import ‘R.utils::timestamp’ by ‘utils::timestamp’ when loading ‘GenomicSEM’ 
25: replacing previous import ‘Matrix::head’ by ‘utils::head’ when loading ‘GenomicSEM’ 

I also tried using a different version of R and loading only genomicSEM in case the errors I get when running the tutorial code have something to do with conflicts between packages. I also tried using  a different GWAS (not one I downloaded in the tutorial). When I munge it, I get the following error: 

Warning message: In merge.data.frame(ref, files[[i]], by = "SNP", all.x = F, all.y = F) : column name ‘effect’ is duplicated in the result

But the munge seemed to have worked...

What's weird is that this error is about the 'effect' column. The original errors (from a few days ago) were also about the 'effect' column, though those errors occurred at the final steps of the code and suggested I rename the Beta column in the summary statistics file 'effect'. I did that several days ago, and it made no difference.

I get the same error telling me to rename my Beta column 'effect' for my GWAS called ASB. I hope I'm not missing something silly like not having my columns in ASB (and CP) in a certain order. I submitted part of the set as a job and got a .Rout file. I've attached it.

Best,
Charleen

genomicSEM_ASB_EduYears.Rout

agro...@gmail.com

unread,
Nov 8, 2021, 10:50:25 AM11/8/21
to Genomic SEM Users
Hi Charleen, 

It turned out the issue was that since you didn't define the arguments in the context of running the sumstats function itself that they were in the wrong order so it was misinterpreting different commands. The best way to fix this going forward is to manually define the arguments when running the function so you don't have to worry about the ordering. So you'd write your code as below and it should run fine: 

p_sumstats<-sumstats(files=files, ref=ref, trait.names=trait.names, se.logit=se.logit, info.filter=info.filter, maf.filter=maf.filter, OLS=c(T,T), linprob=NULL, N=c(257828,766345)) 

The warnings you are getting when installing the package are also safe to ignore and unrelated to the issues you were having. Let me know if you have other questions! 

Best, 
  Andrew

agro...@gmail.com

unread,
Nov 8, 2021, 11:06:34 AM11/8/21
to Genomic SEM Users
Hi Charleen, 

Apologies but there was actually a separate bug unrelated to your issue when running sumstats for OLS traits that will cause the function to fail. You'll want to reinstall the package, restart R, and then rerun in order to get results. I'll also flag that it looks like for this particular set of 1,000 SNPs that these are all highly significant (average Z > 9) so the function will print a warning about columns potentially being misinterpreted. This is safe to ignore if you are just examining a subset of highly significant SNPs, but of course this high average Z should not be true when running on the full set of summary stats. 

-Andrew

Charleen Adams

unread,
Nov 8, 2021, 12:37:29 PM11/8/21
to Genomic SEM Users
Thank you. Your uptake fixed the weird "effect" error. It's now to this stage and has been cooking for about 20 minutes: 

outputGWAS<-userGWAS(covstruc=LDSCoutput,SNPs=p_sumstats,estimation="DWLS",model=model,sub =c("C~SNP","NC~SNP"))# printwarn = FALSE [1] "Please note that an update was made to userGWAS on 11/21/19 so that it combines addSNPs and userGWAS." [1] "Starting GWAS Estimation" This is lavaan 0.6-9 lavaan is FREE software! Please report any bugs.

Not sure how long it takes to run. I did this in Interactive RStudio on our server, but it may get killed if I walk away from the computer and my Cisco VPN connection times out. If so, I'll know this afternoon and will run it as a job. I'll write again with the results.

Reply all
Reply to author
Forward
0 new messages