Hi,
I have a couple of questions regarding results interpretation. I have a datasets of ~200 samples (105 tests:80 ctrls) that have been subsetted retaining variants with AF_nfe<0.01 (after normal QC steps). I am running SKATO analysis as following:
>
bed="./file.bed"
bim="./file.bim"
[...]
Generate_SSD_SetID(bed, bim, fam, SetID, SSD, info)
SSD.INFO<-Open_SSD(SSD, info)
FAMCOV = Read_Plink_FAM_Cov(fam, cov, Is.binary = T,cov_header=F)
obj_2<-SKAT_Null_Model(Phenotype ~Sex+COV1+COV2, data=FAMCOV, out_type="D", n.Resampling = 150)
out.skato_2<-SKATBinary.SSD.All(
SSD.INFO, obj_2, method="SKATO")
----
I tried to view a head(out.skato$results, n=15) of out.skato_2$results to check my most significant SetID. What I can see is this :
SetID
P.value
N.Marker.All
N.Marker.Test
MAC
m
Method.bin
MAP
TOLLIP
0.00408482833943312
3
3
5
5
ER
1.41608735833571e-06
TRIM62
0.
0114374722142699 5
5
5
5
ER
0.0027745686023805
SLC15A4
0.012238408493282
4
4
6
6
ER
9.13557668861148e-06
C1QC
0.0129281258528486
3
3
4
4
ER
0.00494110777259975
IKBKB
0.0188163336005402
6
6
8
8
ER
3.24372462007309e-06
PYCARD
0.0193904701383181
1
1
1
1
ER
0.0193904701383181
DEFB118
0.024473425966973
3
3
7
7
ER
8.40867535575008e-07
IFIH1
0.0272642338314762
9
9
15
15
ER
3.65401488181125e-16
DEFA5
0.0284552707144266
4
4
8
8
ER
1.09025532341022e-08
IFIT5
0.0318538084173642
6
6
11
11
ER
6.54958554049511e-10
MAP3K5
0.033188763190525
1
1
27
27
ER.A
-1
BPIFB3
0.0334549501143324
5
5
8
8
ER
4.83450946486671e-05
CSF1R
0.0334586889354163
9
9
16
16
ER
1.18361601439823e-12
C4BPA
0.0362967172030714
4
4
16
14
ER
2.24778178244795e-09
CD209
0.0374247404728045
1
1
7
6
ER
0.000134741961247268
Trying to inspect variants distribution between cases and controls by gene I stopped at gene "MAP3K5". Here, pvalue is 0.033. When I table variants distribution I obtain this:
test controls Gene pval
rs1346689102 27 0 MAP3K5 0.03318876
Here, my question is: in this specific case where a variant is present only in cases and none in controls, with a very unbalanced distribution, how can I discard its significance in term of pvalue (given that bonf corrected threshold is 0.0001 and so 0.03 results not significant) giving that distribution? And I get even more doubtful when, trying other combinations of variant filtering I found a bonf-corrected significant pvalue in a set where the three variants where distributed 2 in cases and 1 in controls (data not shown).
I know pvalue in SKATO doesn't have a direction but I am struggling understanding these differences.
Thank you!!