Hi All, I have been running a skat-o test on a small GWAS (123 cases vs 107 ctrls), using both the original R package and the skat-o command in Epacts, as below
in Epacts:epacts group -test skat -skat-o -skat-adjust -vcf ../GENO.recode.vcf.gz -ped ../PDsamples_TSI_merged.modifiedPED.ped -groupf GENO.grp -out skat-o/skat-o -cov sex -cov C1 -cov C2 -cov C3 -cov C4 -cov C5 -cov C6 -cov C7 -cov C8 -cov C9 -cov C10 -pheno pheno -missing ./. -field GT --min-maf 0.001 --max-maf 0.1 --min-mac 1 --min-callrate 0.9 -run 1
in R:phenocov <- Read_Plink_FAM_Cov(paste(dataset2, ".correct.fam", sep=""), paste(dataset2, ".covariates.txt", sep=""), Is.binary=TRUE, flag1=0, cov_header=TRUE)
Generate_SSD_SetID("GENO.b.bed", "GENO.b.bim", "GENO.b.fam", "GENO.setID", File.SSD="GENO.SSD", File.Info="GENO.SSD.Info")
SSD.INFO <- Open_SSD(File.SSD="GENO.SSD", File.Info="GENO.SSD.Info")
nullModelKin <- SKAT_NULL_emmaX(Phenotype ~ sex + C1 + C2 + C3 + C4 + C5 + C6 + C7 + C8 + C9 + C10, data=phenocov, K=NULL, Kin.File="GENO.kinf")
#4-ii# skat-o (with small sample size adjustment)
SKATresults <- SKAT.SSD.All(SSD.INFO, nullModelKin, method="optimal.adj")
SKATresults2 <- SKATresults$results
What is strange is that I get different results (see below top enriched genes, with p-value < 10-4).
in Epacts:
CHROM BEGIN END MARKER_ID NS FRAC_WITH_RARE NUM_ALL_VARS NUM_PASS_VARS NUM_SING_VARS PVALUE STATRHO
16 22376218 22376218 16:22376218-22376218_CDR2 230 0.03913 1 1 0 3.5503e-05 NA
17 40189847 40189847 17:40189847-40189847_ZNF385C 230 0.021739 1 1 0 4.6054e-05 NA
16 4390358 4390358 16:4390358-4390358_CORO7 230 0.14348 1 1 0 7.221e-05 NA
16 4390358 4390358 16:4390358-4390358_PAM16 230 0.14348 1 1 0 8.2825e-05 NA
9 111947761 112082510 9:111947761-112082510_EPB41L4B 230 0.1913 3 2 0 0.00010607 0
8 52232487 52366300 8:52232487-52366300_PXDNL 230 0.3087 11 7 0 0.00014479 0.09
17 41738823 41738823 17:41738823-41738823_MEOX1 230 0.17826 1 1 0 0.00014628 NA
2 103335441 103335646 2:103335441-103335646_MFSD9 230 0.17391 2 2 0 0.00018377 0.5
12 117977550 117977550 12:117977550-117977550_KSR2 230 0.034783 1 1 0 0.00021868 NA
19 38160383 38160760 19:38160383-38160760_ZNF781 230 0.22174 3 3 0 0.00023177 0
15 91449962 91461545 15:91449962-91461545_MAN2A2 230 0.25652 6 4 0 0.00026198 0
1 151254023 151254036 1:151254023-151254036_RP11-126K1.2 230 0.13478 2 1 0 0.00041281 NA
19 5598801 5598801 19:5598801-5598801_SAFB2 230 0.1 1 1 0 0.00053932 NA
13 52439890 52440130 13:52439890-52440130_CCDC70 230 0.026087 2 1 0 0.00056331 NA
17 27943024 27943024 17:27943024-27943024_CORO6 230 0.043478 1 1 0 0.00067776 NA
2 173337495 173344474 2:173337495-173344474_ITGA6 230 0.11304 3 1 0 0.00070369 NA
18 11639335 11639335 18:11639335-11639335_RP11-677O4.1 230 0.15217 1 1 0 0.00090373 NA
in R:SetID P.value N.Marker.All N.Marker.Test
RNU6-319P 5.3203609018011e-46 1 1
AIP 3.07010287331135e-32 2 2
DDX11L5 5.59063898526521e-12 2 2
XXyac-YRM2039.2 5.59063898526521e-12 2 2
POTEH 1.0850000246997e-08 1 1
ABHD11 2.109200647471e-07 4 4
LINC00035 2.109200647471e-07 4 4
RP11-396K3.1 1.0269648080756e-05 1 1
MIR520E 3.16469441355993e-05 3 3
MIR515-1 3.1695494745354e-05 6 6
MIR519E 3.18533547117861e-05 7 7
MIR520F 3.18533547117861e-05 7 7
PARGP1 4.73016843232088e-05 4 4
TIMM23 4.73016843232088e-05 4 4
ZDHHC19 8.87258318733597e-05 15 15
It seems that in Epacts the skat-o command somehow runs some kind of QC on the variants analyzed, althoug I was not able to find details on this QC in the relevant page. In principle I would rely more on SKAT in R, but this seems to show very inflated p-values in this case, which I did not expect at all given single variant associations obtained in these genes..maybe some of you can tell me whether I did anything wrong? Maybe specifying the weight of variants would help? I used SKAT R package in other GWAS and got results somehow consistent with single variant associations, and not inflated at all..
I am looking forward to hearing from you.
Best,
Alessandro