Hi Raymond,
Thanks for your quick reply, it's very helpful. In addition, I have another questions about the use of the software.
1) It's recommended to use the HapMap3 SNP list for LD score regression, but if the summary statistics are derived from genotyped data, rather than imputed data, and the genotyping quality is fine, then we don't need to set the --merge-alleles flag to filter the SNPs, right? Because there are a large part of SNPs not in this hm3 list, and I hope to use all SNPs in my data. Furthermore, if we have the information of imputation quality, we also don't need this hm3 list, right?
2) If the mean chi-square is too small (<1.02), it means it's not suitable for the regression analysis, but what's the reason? Is it due to the small sample size, low heritability for this phenotype or some others? The sample size of my data is ~5700. Please see the log files below.
3) Whether we can calculate a P-value for the total observed h2 with the null hypothesis h2=0?
Best,
Jane
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./munge_sumstats.py \
--out result/chd_nonadj \
--sumstats data/chd_nonadj.assoc
Interpreting column names as follows:
info: INFO score (imputation quality; higher --> better imputation)
snpid: Variant ID (e.g., rs number)
N: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
or: Odds ratio (1 --> no effect; above 1 --> A1 is risk increasing)
Reading sumstats from data/chd_nonadj.assoc into memory 5000000 SNPs at a time.
Read 1257031 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= 0.9.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 43338 variants that were not SNPs or were strand-ambiguous.
1213693 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (1213693 SNPs remain).
Removed 0 SNPs with N < 3828.0 (1213693 SNPs remain).
Median value of or was 0.9998, which seems sensible.
Writing summary statistics for 1213693 SNPs (1213693 with nonmissing beta) to result/chd_nonadj.sumstats.gz.
Metadata:
Mean chi^2 = 0.995
WARNING: mean chi^2 may be too small.
Lambda GC = 0.992
Max chi^2 = 22.907
0 Genome-wide significant SNPs (some may have been removed by filtering).
Conversion finished at Wed May 18 10:53:15 2016
Total time elapsed: 18.09s
*********************************************************************
* LD Score Regression (LDSC)
* Version 1.0.0
* (C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--h2 result/chd_nonadj.sumstats.gz \
--ref-ld-chr ref_data/eas_ldscores/ \
--out result/chd_nonadj.h2 \
--w-ld-chr ref_data/eas_ldscores/
Beginning analysis at Wed May 18 10:54:35 2016
Reading summary statistics from result/chd_nonadj.sumstats.gz ...
Read summary statistics for 1213693 SNPs.
Reading reference panel LD Score from ref_data/eas_ldscores/[1-22] ...
Read reference panel LD Scores for 1208050 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from ref_data/eas_ldscores/[1-22] ...
Read regression weight LD Scores for 1208050 SNPs.
After merging with reference panel LD, 579457 SNPs remain.
After merging with regression SNP LD, 579457 SNPs remain.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.0023 (0.084)
Lambda GC: 0.9927
Mean Chi^2: 0.9951
Intercept: 0.9948 (0.0064)
Ratio: NA (mean chi^2 < 1)
Analysis finished at Wed May 18 10:54:44 2016
Total time elapsed: 9.41s
Raymond Walters於 2016年5月18日星期三 UTC+8上午1時19分49秒寫道: