Issues with the commonfactorGWASpar function

298 views
Skip to first unread message

young...@alumni.brown.edu

unread,
Oct 31, 2019, 4:05:12 PM10/31/19
to Genomic SEM Users
Hi! I am having trouble running multivariate GWAS for a common factor using the commonfactorGWASpar function. Whenever I try to run this, it keeps generating the following error message:

Warning messages:1: In mclapply(X = 1:f, FUN = function(X) { :
 all scheduled cores encountered errors in user code
2: In mclapply(X = 1:f, FUN = function(X) { :
 all scheduled cores encountered errors in user code
error calling combine function:
<simpleError in rbind(deparse.level, ...): numbers of columns of arguments do not match>
Error in `colnames<-`(`*tmp*`, value = c("i", "n", "lhs", "op", "rhs",  :
 attempt to set 'colnames' on an object with less than two dimensions
Calls: commonfactorGWASpar -> colnames<-

Execution halted

I would greatly appreciate if you could help me address this issue.

Many thanks,
Younga

Andrew Grotzinger

unread,
Nov 1, 2019, 8:40:10 PM11/1/19
to Genomic SEM Users
Hi Younga, 

Could you e-mail me (agro...@utexas.edu) the output from the addSNPs function for just the first 40 SNPs so I can take a look. You can subset to the first 40, by first running code like sumstats<-sumstats[1:40,]. 

Best, 
  Andrew

young...@alumni.brown.edu

unread,
Nov 4, 2019, 2:05:05 PM11/4/19
to Genomic SEM Users
Hi Andrew,

I've sent the output to your email. Thanks for your help!  

Best,
Heather 

Andrew Grotzinger

unread,
Nov 5, 2019, 7:37:03 PM11/5/19
to Genomic SEM Users
Hi Heather, 

Thanks for sending! I am getting a different error of: "task 1 failed - "system is computationally singular: reciprocal condition number = 9.1904e-17". This particular error is something that can be circumvented by setting the tolerance to a lower threshold (e.g., 1e-20) using the 'tol' argument as in the code below. Note that you will likely get a warning message for most SNPs that the information matrix could not be inverted; this is a warning printed directly from lavaan that is safe to ignore in this case (and something I'll look at suppressing for the future) because you have set a lower tolerance yourself and thereby made the matrix invertible. Once I set the lower tolerance it is running OK on my end. I know some others have gotten a similar error message to yours when the sumstats file includes an extra column or two that got added in after saving it as a .csv file. One thing you could check is that the first column in your full sumsats file is "SNP' and not something like "V1". If your sumstats file looks OK, and setting a lower tolerance doesn't help as in the code below, I would also just try reinstalling the package to make sure you have the most recent version. If you are still getting an error message after those three troubleshooting steps then reach back out and we will figure it out! 

commonfactorGWASpar(SNPcov_subset,tol=1e-20)

young...@alumni.brown.edu

unread,
Nov 7, 2019, 3:19:58 PM11/7/19
to Genomic SEM Users
Hi Andrew!

Thanks so much for your response! I was able to run the commonfactorGWASpar by setting the tolerance to a lower threshold with the subset of data; however, I was not able to run it with the full data. It gave me the following error:

This is lavaan 0.6-5
lavaan is BETA software! Please report any bugs.

error calling combine function:
<simpleError in rbind(deparse.level, ...): numbers of columns of arguments do not match>
Error in { :
 task 91 failed - "Lapack routine dgesv: system is exactly singular: U[13,13] = 0"
In addition: There were 50 or more warnings (use warnings() to see the first 50)

I'd appreciate if you could share some thoughts on how to address this issue. Thanks very much!

P.S. I made sure that the sumstats file has SNP as the first column, so I don't think it's the issue of the sumstats file. I also tried re-installing the package but it didn't help either.

Andrew Grotzinger

unread,
Nov 7, 2019, 5:46:36 PM11/7/19
to Genomic SEM Users
Hi Heather, 

I haven't seen that error before unfortunately so I'll have to take a look using the full set of summary statistics to figure out which particular run/SNP is causing it to fail. Can you e-mail me the summary stats and ldsc output (prior to running the addSNPs function) and I'll take a look? 

Thanks!
 -Andrew

young...@alumni.brown.edu

unread,
Nov 8, 2019, 11:42:45 AM11/8/19
to Genomic SEM Users
Hi Andrew, 

I just sent all the materials you requested to your email. Thanks a lot for helping me out with this!  

Best,
Heather 

Andrew Grotzinger

unread,
Nov 12, 2019, 11:51:58 AM11/12/19
to Genomic SEM Users
Hi Heather, 

Thanks for sending everything! It looks like that particular error message is popping up for any SNP that is printed as having a beta and SE value of exactly 0 for one of your traits. I'm assuming this is a product of these being case/control traits with odds ratios estimated at exactly 1 (due to rounding) in the summary statistics files, that are then converted to betas and SEs of 0 when the sumstats function performs the necessary transformations. I've updated the sumstats function to remove these rows with 0s for future users, but if you don't want to re-run sumstats the code below will subset the sumstats data file to only rows with non-zero betas and SEs. Thanks for raising this issue and bringing it to our attention! 

After subsetting commonfactorGWASpar ran with no issues on my end, but of course let me know if you run into anymore issues. On a somewhat separate note, I noticed that the summary stats you sent only had ~21K rows. This is pretty small so I'm wondering if this is a subset of your full sumstats file, or if you have one particular set of univariate summary stats with really low coverage (i.e., only 30k tagged SNPs or something like that?). 

autoimm_sumstats[autoimm_sumstats==0] <- NA
autoimm_sumstats<-autoimm_sumstats[complete.cases(autoimm_sumstats),]


-Andrew 

young...@alumni.brown.edu

unread,
Nov 13, 2019, 10:44:30 AM11/13/19
to Genomic SEM Users
Hi Andrew,

Good news - I was able to run commonfactorGWASpar with no problem! Thank you so much for your help with this! 

Regarding your comment about the number of rows in the summary statistics I sent earlier, it was the full dataset but the number was dramatically reduced because of one particular univariate summary statistics. When I run the sumstats function, these are the messages generated while processing that summary statistics: 

[1] "121619 rows present in the full trait1.tsv summary statistics file."
[1] "24654 rows were removed from the trait1.tsv summary statistics file as the rsIDs for these SNPs were not present in the reference file."
[1] "The effect column was determined to be coded as an odds ratio (OR) for the trait1.tsv summary statistics file based on the median of the effect column being close to 1. Please ensure the interpretation of this column as an OR is correct."
[1] "17 row(s) were removed from the trait1.tsv summary statistics file due to the effect allele (A1) column not matching A1 or A2 in the reference file."
[1] "1 row(s) were removed from the trait1.tsv summary statistics file due to the other allele (A2) column not matching A1 or A2 in the reference file."
[1] "No INFO column, cannot filter on INFO, which may influence results"
[1] "Performing transformation under the assumption that the effect column is either an odds ratio or logistic beta and the SE column is a logistic SE (i.e., NOT the SE of the odds ratio) for: trait1.tsv"
[1] "95865 SNPs are left in the summary statistics file trait1.tsv after QC and merging with the reference file."
[1] "73727 rows were removed from the trait1.tsv summary statistics file as the rsIDs for these SNPs were not present for the other summary statistics."

Basically, the highlighted message says that it removed ~70K from ~90K SNPs and therefore we’re left with ~20K. Would you suggest that I remove this particular summary statistics when running commonfactorGWASpar? I’d love to hear your advice on this as well. Thanks so much for your help! 

Best,
Heather

Andrew Grotzinger

unread,
Nov 13, 2019, 11:17:49 AM11/13/19
to Genomic SEM Users
Hi Heather, 

Of course, and glad it's running now! Typically summary statistics have > 1 million SNPs, so for that one particular trait to have < 100k SNPs even before merging is quite low. Assuming that your other summary statistics have far more SNPs you are losing a lot by including that one trait and I would recommend excluding it. Just a note that if you do exclude, you'll want to make sure to also exclude it from your ld-score regression estimation prior to running the addSNPs function. 

-Andrew

Elliot Tucker-Drob

unread,
Nov 13, 2019, 11:26:13 AM11/13/19
to Andrew Grotzinger, Genomic SEM Users
Hi Heather,
Also, please be sure to take notice of the following:

"You need the full or very lightly cleaned summary statistics generated from a GWAS, so if the authors provide summary statistics only for the top 5.000 SNPs, or even the top 100.000 "pruned" SNPs this is not sufficient. Often if you get in touch with the authors, they have a mechanism for you to obtain the full summary statistics. Sometimes this may involve you agreeing not to identify the participants in their study. Sometimes you may need to sign some documents."


To be clear- you can potentially have low coverage of SNPs for LDSC and Genomic SEM as long as they aren't pruned for LD. You should verify that this is not the case.

--
You received this message because you are subscribed to the Google Groups "Genomic SEM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genomic-sem-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/genomic-sem-users/5c5f0f1d-6936-4287-b313-75404ee6b971%40googlegroups.com.


--
Elliot M. Tucker-Drob, Ph.D.
Associate Professor
Department of Psychology
Faculty Research Associate
Population Research Center
The University of Texas at Austin
108 E. Dean Keeton Stop A8000
Austin, TX 78712-0187
tucke...@utexas.edu
www.lifespanlab.com

young...@alumni.brown.edu

unread,
Nov 13, 2019, 11:39:26 AM11/13/19
to Genomic SEM Users
Got it. I'll try to run without the univariate summary statistics with particularly few SNPs and reach out to the authors for the full summary statistics. Thanks Andrew and Elliot for your tremendous help! 

All the best,
Heather 

To unsubscribe from this group and stop receiving emails from it, send an email to genomic-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages