Step 3: Prepare the summary statistics for GWAS

76 views
Skip to first unread message

AnnaAP

unread,
Feb 8, 2024, 10:32:28 AMFeb 8
to Genomic SEM Users
Dear all, 

I am trying to prepare the summary statistics for GWAS. I have two continous outcomes. 

Following the instructions, I tried to run this command: files=c("SBP_renamed.txt.gz","DBP_renamed.txt.gz")
ref="reference.1000G.maf.0.005.txt.gz"
trait.names=c("SBP","DBP")
se.logit=c(F,F)
info.filter=.6
maf.filter=0.01
BP_sumstats <-sumstats(files=files,ref=ref,trait.names=trait.names,
                      se.logit=se.logit,
                      OLS=TRUE,
                      linprob=FALSE,
                      N=NA,
                      info.filter=info.filter,
                      maf.filter=maf.filter,
                      keep.indel=FALSE,
                      parallel=TRUE,cores=NULL)

However, it did not run. 

Thus, I ran the command below changing the OLS, and N. and it worked. Could you please confirm if the specification below is correct? I am unsure about the OLS, N, and linprob arguments. 

BP_sumstats <-sumstats(files=files,ref=ref,trait.names=trait.names,
                      se.logit=se.logit,
                      OLS=NULL,
                      linprob=FALSE,
                      N=NULL,
                      info.filter=info.filter,
                      maf.filter=maf.filter,
                      keep.indel=FALSE,
                      parallel=TRUE,cores=NULL)

Thank you in advance 
Anna

agro...@gmail.com

unread,
Feb 8, 2024, 11:43:20 AMFeb 8
to Genomic SEM Users
Hi Anna, 

If they are continuous outcomes you should be setting OLS to TRUE and providing the total sample size to the N argument (the fact that you didn't provide N is likely why the first set of code didn't run). This is what we write on the github wiki for the N argument: 

"N: A user provided N listed in the order the traits are listed for the files argument. If no information is provided, the default is for the function to assume NULL. When backing out a logistic beta using the linprob argument this requires the sum of effective sample sizes across the cohorts contributing to the GWAS. For OLS transformations this should be the total sample size. If sample sizes are being provided for some traits, but not others, then NA can be used for traits that the user does not wish to provide sample sizes for. If the summary statistics file includes a sample size column, then the user can also list NA if they wish to use the SNP-specific sample sizes to perform the rescaling. However ,we note again that this should be the sum of effective sample sizes for dichotomous traits. For this particular example, all traits are case/control and already report odds ratios or logistic betas, or in the case of alcohol use disorder only Z-statistics are provided but the sum of effective N is present in the GWAS summary data. Thus, this argument is listed as NULL for these traits."

Best, 
  Andrew

Anna Deaicy Argoty Pantoja

unread,
Feb 8, 2024, 4:48:44 PMFeb 8
to Genomic SEM Users
Dear Andrew, 

I appreciate your response. I am still doubting the N argument. In the first command of the previous email I have set N = NA, following this sentence: "If the summary statistics file includes a sample size column, then the user can also list NA if they wish to use the SNP-specific sample sizes to perform the rescaling". My summary statistics have the N column specifying the sample size in each row.The sample size in each summary stats is not the same in each row: yo can see below

SNP CHR POSITION A1 A2 MARKER SE P EAF N BETA
rs1example 6 693731 T C 1:693731:G:A 0.0347345675846699 0.264956580388581 0.11754662680942 716590 -0.03872043925 
rs2example 8 752566 A G 1:752566:G:A 0.0257458509261734 0.023221194443032 0.836595962148451 669278 0.058437307327    
rs3example 9 18836205 A G 1:18836205:T:C 0.0614574706588178 0.357548946206439 0.0190044450293547 998988 0.056543   

How can I set the N argument in this case?

Thank you in advance

agro...@gmail.com

unread,
Feb 8, 2024, 9:18:18 PMFeb 8
to Genomic SEM Users
I think the issue is that the N argument needs to be a vector with the length equal to the number of traits you are providing. So if you have two OLS traits that each have an N column with total sample size in the GWAS data you would write N=c(NA,NA). 

If that doesn't fix the issue if you could paste what the error message is that it's providing (and the .log file from sumstats) that would be great. 

AnnaAP

unread,
Feb 10, 2024, 12:50:33 PMFeb 10
to Genomic SEM Users
Dear Andrew, 

Thank you for your guidance! The last error was Length of files and linprob should be equal. But I could solve it now.

My code is as follows and it ran very well:

BP_sumstats <-sumstats(files=files,ref=ref,trait.names=trait.names,
                      se.logit=se.logit,
                      OLS=c(TRUE,TRUE),
                      linprob=c(FALSE,FALSE),
                      N=c(NA,NA),
                      info.filter=info.filter,
                      maf.filter=maf.filter,
                      keep.indel=FALSE,
                      parallel=TRUE,cores=NULL)


Thank you very much for your help!!
Reply all
Reply to author
Forward
0 new messages