First thank you for developing GenomicSEM! I am following the tutorial of GWAS-by-Substation and got two warning messages:
The average value of estimate over standard error (i.e., Z) is > 5 for CP. This suggests a column was misinterpreted or arguments were misspecified. Please post on the google group if you are unable to figure out the issue.
The average value of estimate over standard error (i.e., Z) is > 5 for EA. This suggests a column was misinterpreted or arguments were misspecified. Please post on the google group if you are unable to figure out the issue.
and the final output has all statistics but no SNP information:
> head(outputGWAS[[2]][,1:16])
SNP CHR BP MAF A1 A2 lhs op rhs free label est SE
5 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04056658 0.008856452
51 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04834538 0.008859751
52 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04758777 0.008983912
53 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.03221660 0.008188796
54 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.04342814 0.008180632
55 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.01936844 0.008223158
Z_Estimate Pval_Estimate chisq
5 -4.580454 4.639668e-06 4.196054e-15
51 -5.456743 4.849476e-08 2.999869e-17
52 -5.296999 1.177212e-07 6.785004e-18
53 3.934230 8.346398e-05 2.859651e-19
54 5.308654 1.104380e-07 3.235203e-16
55 2.355353 1.850513e-02 4.517991e-15
Can you help me with this? I pasted my R code and all output below:
R code:
require(GenomicSEM)
munge("GWAS_CP_all.txt",
"../w_hm3.snplist",
trait.names="CP",
N=257828,
info.filter = 0.9,
maf.filter = 0.01)
munge("GWAS_EA_excl23andMe.txt",
"../w_hm3.snplist",
trait.names="EA",
N=766345,
info.filter = 0.9,
maf.filter = 0.01
)
traits <- c("CP.sumstats.gz","EA.sumstats.gz")
sample.prev <- c(NA,NA)
population.prev <- c(NA,NA)
ld<-"../eur_w_ld_chr/"
wld <- "../eur_w_ld_chr/"
trait.names<-c("CP", "EA")
LDSCoutput <- ldsc(traits,
sample.prev,
population.prev,
ld,
wld,
trait.names)
save(LDSCoutput, file="LDSCoutputCogNonCog.RData")
#load(file="LDSCoutputCogNonCog.RData")
model<-'C=~NA*EA + start(0.4)*CP
NC=~NA*EA
NC~~1*NC
C~~1*C
C~~0*NC
CP ~~ 0*EA
CP~~0*CP
EA~~0*EA'
output<-usermodel(LDSCoutput,estimation="DWLS",model=model)
output
save(output, file="Modeloutput.Rdata" )
files = c("CP_1000.txt", "EA_1000.txt")
ref = "../reference.1000G.maf.0.005.txt"
trait.names = c("CP","EA")
se.logit = c(F,F)
info.filter = 0.6
maf.filter = 0.01
p_sumstats<-sumstats(files=files, ref=ref, trait.names=trait.names, se.logit=se.logit, info.filter=info.filter, maf.filter=maf.filter, OLS=c(T,T),linprob=NULL, N=c(257828,766345))
save(p_sumstats, file="Sumstats.RData")
model<-'C=~NA*EA +start(0.2)*EA + start(0.5)*CP
NC=~NA*EA +start(0.2)*EA
C~SNP
NC~SNP
NC~~1*NC
C~~1*C
C~~0*NC
CP ~~ 0*EA
CP~~0*CP
EA~~0*EA
SNP~~SNP'
outputGWAS<-userGWAS(covstruc=LDSCoutput,SNPs=p_sumstats,estimation="DWLS",model=model,sub =c("C~SNP","NC~SNP"))
save(outputGWAS, file="outputGWAS.RData")
head(outputGWAS[[2]][,1:16])
q()
Output:
[dlai@i12 gwas_by_subtraction]$ R --no-save < tutorial.R
R version 4.0.4 (2021-02-15) -- "Lost Library Book"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> require(GenomicSEM)
Loading required package: GenomicSEM
There were 24 warnings (use warnings() to see them)
>
> munge("GWAS_CP_all.txt",
+ "../w_hm3.snplist",
+ trait.names="CP",
+ N=257828,
+ info.filter = 0.9,
+ maf.filter = 0.01)
The munging of 1 summary statistics started at 2022-03-28 11:24:37
Reading in reference file
Reading summary statistics for GWAS_CP_all.txt. Please note that this step usually takes a few minutes due to the size of summary statistic files.
All files loaded into R!
Munging file: GWAS_CP_all.txt
Interpreting the MARKERNAME column as the SNP column.
Interpreting the A1 column as the A1 column.
Interpreting the A2 column as the A2 column.
Interpreting the BETA column as the effect column.
Interpreting the PVAL column as the P column.
Interpreting the EAF column as the MAF column.
Interpreting the SE column as the SE column.
Using provided N (257828) for file:GWAS_CP_all.txt
Merging file:GWAS_CP_all.txt with the reference file:../w_hm3.snplist
10098325 rows present in the full GWAS_CP_all.txt summary statistics file.
8895627 rows were removed from the GWAS_CP_all.txt summary statistics file as the rs-ids for these rows were not present in the reference file.
No INFO column, cannot filter on INFO, which may influence results
25589 rows were removed from the GWAS_CP_all.txt summary statistics file due to missing MAF information or MAFs below the designated threshold of0.01
1177109SNPs are left in the summary statistics file GWAS_CP_all.txt after QC.
I am done munging file: GWAS_CP_all.txt
The file is saved as CP.sumstats.gz in the current working directory.
Munging was completed at 2022-03-28 11:28:06
The munging of all files took 3 minutes and 29.190247297287 seconds
Please check the .log file(s) to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files
>
> munge("GWAS_EA_excl23andMe.txt",
+ "../w_hm3.snplist",
+ trait.names="EA",
+ N=766345,
+ info.filter = 0.9,
+ maf.filter = 0.01
+ )
The munging of 1 summary statistics started at 2022-03-28 11:28:06
Reading in reference file
Reading summary statistics for GWAS_EA_excl23andMe.txt. Please note that this step usually takes a few minutes due to the size of summary statistic files.
All files loaded into R!
Munging file: GWAS_EA_excl23andMe.txt
Interpreting the MARKERNAME column as the SNP column.
Interpreting the A1 column as the A1 column.
Interpreting the A2 column as the A2 column.
Interpreting the BETA column as the effect column.
Interpreting the PVAL column as the P column.
Interpreting the EAF column as the MAF column.
Interpreting the SE column as the SE column.
Using provided N (766345) for file:GWAS_EA_excl23andMe.txt
Merging file:GWAS_EA_excl23andMe.txt with the reference file:../w_hm3.snplist
10101242 rows present in the full GWAS_EA_excl23andMe.txt summary statistics file.
8898516 rows were removed from the GWAS_EA_excl23andMe.txt summary statistics file as the rs-ids for these rows were not present in the reference file.
No INFO column, cannot filter on INFO, which may influence results
25623 rows were removed from the GWAS_EA_excl23andMe.txt summary statistics file due to missing MAF information or MAFs below the designated threshold of0.01
1177103SNPs are left in the summary statistics file GWAS_EA_excl23andMe.txt after QC.
I am done munging file: GWAS_EA_excl23andMe.txt
The file is saved as EA.sumstats.gz in the current working directory.
Munging was completed at 2022-03-28 11:30:29
The munging of all files took 2 minutes and 23.2716586589813 seconds
Please check the .log file(s) to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files
>
> traits <- c("CP.sumstats.gz","EA.sumstats.gz")
> sample.prev <- c(NA,NA)
> population.prev <- c(NA,NA)
> ld<-"../eur_w_ld_chr/"
> wld <- "../eur_w_ld_chr/"
> trait.names<-c("CP", "EA")
>
> LDSCoutput <- ldsc(traits,
+ sample.prev,
+ population.prev,
+ ld,
+ wld,
+ trait.names)
Multivariate ld-score regression of 2 traits (CP.sumstats.gz EA.sumstats.gz) began at: 2022-03-28 11:30:29
Reading in LD scores
Read in summary statistics [1/2] from: CP.sumstats.gz
Out of 1177109 SNPs, 1167777 remain after merging with LD-score files
Removing 0 SNPs with Chi^2 > 257.828; 1167777 remain
Read in summary statistics [2/2] from: EA.sumstats.gz
Out of 1177103 SNPs, 1167771 remain after merging with LD-score files
Removing 0 SNPs with Chi^2 > 766.345; 1167771 remain
Estimating heritability [1/3] for: CP.sumstats.gz
Heritability Results for trait: CP.sumstats.gz
Mean Chi^2 across remaining SNPs: 2.0202
Lambda GC: 1.7084
Intercept: 0.9988 (0.0126)
Ratio: -0.0012 (0.0123)
Total Observed Scale h2: 0.1993 (0.0068)
h2 Z: 29.4
Calculating genetic covariance [2/3] for traits: CP.sumstats.gz and EA.sumstats.gz
1167771 SNPs remain after merging CP.sumstats.gz and EA.sumstats.gz summary statistics
Results for genetic covariance between: CP.sumstats.gz and EA.sumstats.gz
Mean Z*Z: 1.0657
Cross trait Intercept: 0.1671 (0.0109)
Total Observed Scale Genetic Covariance (g_cov): 0.1026 (0.0037)
g_cov Z: 28
g_cov P-value: 2.4291e-172
Estimating heritability [3/3] for: EA.sumstats.gz
Heritability Results for trait: EA.sumstats.gz
Mean Chi^2 across remaining SNPs: 2.6521
Lambda GC: 2.0944
Intercept: 0.9792 (0.0161)
Ratio: -0.0126 (0.0098)
Total Observed Scale h2: 0.1123 (0.003)
h2 Z: 37.2
Genetic Correlation Results
Genetic Correlation between CP and EA: 0.686 (0.0245)
LDSC finished running at 2022-03-28 11:31:07
Running LDSC for all files took 0 minutes and 38 seconds
>
>
> save(LDSCoutput, file="LDSCoutputCogNonCog.RData")
>
> #load(file="LDSCoutputCogNonCog.RData")
>
> model<-'C=~NA*EA + start(0.4)*CP
+ NC=~NA*EA
+
+ NC~~1*NC
+ C~~1*C
+ C~~0*NC
+
+ CP ~~ 0*EA
+ CP~~0*CP
+ EA~~0*EA'
>
>
> output<-usermodel(LDSCoutput,estimation="DWLS",model=model)
[1] "Running primary model"
[1] "Calculating CFI"
[1] "Calculating Standardized Results"
[1] "Calculating SRMR"
elapsed
0.616
[1] "Model fit statistics are all printed as NA as you have specified a fully saturated model (i.e., df = 0)"
>
> output
$modelfit
chisq df p_chisq AIC CFI SRMR
df NA 0 NA NA NA NA
$results
lhs op rhs Unstand_Est Unstand_SE STD_Genotype STD_Genotype_SE
2 C =~ EA 0.2298708 0.00535873579565329 0.6859947 0.0159918750218382
1 C =~ CP 0.4464830 0.00760378989690142 1.0000000 0.0170304116227281
8 NC =~ EA 0.2438145 0.00436859737410486 0.7276065 0.0130370419171318
9 NC ~~ NC 1.0000000 1.0000000
3 C ~~ C 1.0000000 1.0000000
4 C ~~ NC 0.0000000 0.0000000
6 EA ~~ CP 0.0000000 0.0000000
5 CP ~~ CP 0.0000000 0.0000000
7 EA ~~ EA 0.0000000 0.0000000
STD_All p_value
2 0.6859947 < 5e-300
1 1.0000000 < 5e-300
8 0.7276065 < 5e-300
9 1.0000000 <NA>
3 1.0000000 <NA>
4 0.0000000 <NA>
6 0.0000000 <NA>
5 0.0000000 <NA>
7 0.0000000 <NA>
>
> save(output, file="Modeloutput.Rdata" )
>
> files = c("CP_1000.txt", "EA_1000.txt")
> ref = "../reference.1000G.maf.0.005.txt"
> trait.names = c("CP","EA")
> se.logit = c(F,F)
> info.filter = 0.6
> maf.filter = 0.01
>
>
> p_sumstats<-sumstats(files=files, ref=ref, trait.names=trait.names, se.logit=se.logit, info.filter=info.filter, maf.filter=maf.filter, OLS=c(T,T),linprob=NULL, N=c(257828,766345))
The preparation of 2 summary statistics for use in Genomic SEM began at: 2022-03-28 11:31:08
Please note that the files should be in the same order that they were listed for the ldsc function
Reading in reference file
Applying MAF filer of 0.01 to the reference file.
All files loaded into R!
Preparing summary statistics for file: CP_1000.txt
Interpreting the MARKERNAME column as the SNP column.
Interpreting the A1 column as the A1 column.
Interpreting the A2 column as the A2 column.
Interpreting the BETA column as the effect column.
Interpreting the PVAL column as the P column.
Cannot find N column, try renaming it to N in the summary statistics file for:CP_1000.txt
Interpreting the EAF column as the MAF column.
Interpreting the SE column as the SE column.
Cannot find DIRECTION column, try renaming it to DIRECTION in the summary statistics file for:CP_1000.txt
Using user provided N of 257828 for CP_1000.txt . Please note that this should reflect the total sample size.
0 rows were removed from the CP_1000.txt summary statistics file due to entries that were duplicated across rsID, A1, and A2.
Merging file: CP_1000.txt with the reference file: ../reference.1000G.maf.0.005.txt
999 rows present in the full CP_1000.txt summary statistics file.
5 rows were removed from the CP_1000.txt summary statistics file as the rsIDs for these SNPs were not present in the reference file.
The effect column was determined NOT to be coded as an odds ratio (OR) for the CP_1000.txt summary statistics file based on the median of the effect column being close to 0.
No INFO column, cannot filter on INFO, which may influence results
994 SNPs are left in the summary statistics file CP_1000.txt after QC and merging with the reference file.
Preparing summary statistics for file: EA_1000.txt
Interpreting the MARKERNAME column as the SNP column.
Interpreting the A1 column as the A1 column.
Interpreting the A2 column as the A2 column.
Interpreting the BETA column as the effect column.
Interpreting the PVAL column as the P column.
Cannot find N column, try renaming it to N in the summary statistics file for:EA_1000.txt
Interpreting the EAF column as the MAF column.
Interpreting the SE column as the SE column.
Cannot find DIRECTION column, try renaming it to DIRECTION in the summary statistics file for:EA_1000.txt
Using user provided N of 766345 for EA_1000.txt . Please note that this should reflect the total sample size.
0 rows were removed from the EA_1000.txt summary statistics file due to entries that were duplicated across rsID, A1, and A2.
Merging file: EA_1000.txt with the reference file: ../reference.1000G.maf.0.005.txt
999 rows present in the full EA_1000.txt summary statistics file.
8 rows were removed from the EA_1000.txt summary statistics file as the rsIDs for these SNPs were not present in the reference file.
The effect column was determined NOT to be coded as an odds ratio (OR) for the EA_1000.txt summary statistics file based on the median of the effect column being close to 0.
No INFO column, cannot filter on INFO, which may influence results
991 SNPs are left in the summary statistics file EA_1000.txt after QC and merging with the reference file.
After merging across all summary statistics using listwise deletion, performing QC, and merging with the reference file, there are 438 SNPs left in the final multivariate summary statistics file
Sumstats finished running at 2022-03-28 11:31:32
Running sumstats for all files took 0 minutes and 23.6779294013977 seconds
Please check the log file CP_EA_sumstats.log to ensure that all columns were interpreted correctly and no warnings were issued for any of the summary statistics files.
Warning messages:
1: In .sumstats_main(i, utilfuncs = NULL, filenames[i], trait.names[i], :
The average value of estimate over standard error (i.e., Z) is > 5 for CP. This suggests a column was misinterpreted or arguments were misspecified. Please post on the google group if you are unable to figure out the issue.
2: In .sumstats_main(i, utilfuncs = NULL, filenames[i], trait.names[i], :
The average value of estimate over standard error (i.e., Z) is > 5 for EA. This suggests a column was misinterpreted or arguments were misspecified. Please post on the google group if you are unable to figure out the issue.
>
> save(p_sumstats, file="Sumstats.RData")
>
> model<-'C=~NA*EA +start(0.2)*EA + start(0.5)*CP
+ NC=~NA*EA +start(0.2)*EA
+
+ C~SNP
+ NC~SNP
+
+ NC~~1*NC
+ C~~1*C
+ C~~0*NC
+
+ CP ~~ 0*EA
+ CP~~0*CP
+ EA~~0*EA
+ SNP~~SNP'
>
> outputGWAS<-userGWAS(covstruc=LDSCoutput,SNPs=p_sumstats,estimation="DWLS",model=model,sub =c("C~SNP","NC~SNP"))
[1] "Please note that an update was made to userGWAS on 11/21/19 so that it combines addSNPs and userGWAS."
[1] "Starting GWAS Estimation"
elapsed
8.275
>
> save(outputGWAS, file="outputGWAS.RData")
>
> head(outputGWAS[[2]][,1:16])
SNP CHR BP MAF A1 A2 lhs op rhs free label est SE
5 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04056658 0.008856452
51 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04834538 0.008859751
52 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 -0.04758777 0.008983912
53 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.03221660 0.008188796
54 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.04342814 0.008180632
55 <NA> NA NA NA <NA> <NA> NC ~ SNP 5 0.01936844 0.008223158
Z_Estimate Pval_Estimate chisq
5 -4.580454 4.639668e-06 4.196054e-15
51 -5.456743 4.849476e-08 2.999869e-17
52 -5.296999 1.177212e-07 6.785004e-18
53 3.934230 8.346398e-05 2.859651e-19
54 5.308654 1.104380e-07 3.235203e-16
55 2.355353 1.850513e-02 4.517991e-15
>
> q()
[dlai@i12 gwas_by_subtraction]$
Thanks again and look forward to hearing from you!
Best,
Dongbing