Interpreting results

68 views

Skip to first unread message

Andrea Gaudio

unread,

Jan 31, 2024, 10:13:18 AMJan 31

to SKAT and MetaSKAT user group

Hi,

I have a couple of questions regarding results interpretation. I have a datasets of ~200 samples (105 tests:80 ctrls) that have been subsetted retaining variants with AF_nfe<0.01 (after normal QC steps). I am running SKATO analysis as following:

bed="./file.bed"

bim="./file.bim"

[...]

Generate_SSD_SetID(bed, bim, fam, SetID, SSD, info)
SSD.INFO<-Open_SSD(SSD, info)

FAMCOV = Read_Plink_FAM_Cov(fam, cov, Is.binary = T,cov_header=F)

obj_2<-SKAT_Null_Model(Phenotype ~Sex+COV1+COV2, data=FAMCOV, out_type="D", n.Resampling = 150)
out.skato_2<-SKATBinary.SSD.All(SSD.INFO, obj_2, method="SKATO")

----

I tried to view a head(out.skato$results, n=15) of out.skato_2$results to check my most significant SetID. What I can see is this :

SetID P.value N.Marker.All N.Marker.Test MAC m Method.bin MAP
TOLLIP 0.00408482833943312 3 3 5 5 ER 1.41608735833571e-06
TRIM62 0.0114374722142699 5 5 5 5 ER 0.0027745686023805
SLC15A4 0.012238408493282 4 4 6 6 ER 9.13557668861148e-06
C1QC 0.0129281258528486 3 3 4 4 ER 0.00494110777259975
IKBKB 0.0188163336005402 6 6 8 8 ER 3.24372462007309e-06
PYCARD 0.0193904701383181 1 1 1 1 ER 0.0193904701383181
DEFB118 0.024473425966973 3 3 7 7 ER 8.40867535575008e-07
IFIH1 0.0272642338314762 9 9 15 15 ER 3.65401488181125e-16
DEFA5 0.0284552707144266 4 4 8 8 ER 1.09025532341022e-08
IFIT5 0.0318538084173642 6 6 11 11 ER 6.54958554049511e-10
MAP3K5 0.033188763190525 1 1 27 27 ER.A -1
BPIFB3 0.0334549501143324 5 5 8 8 ER 4.83450946486671e-05
CSF1R 0.0334586889354163 9 9 16 16 ER 1.18361601439823e-12
C4BPA 0.0362967172030714 4 4 16 14 ER 2.24778178244795e-09
CD209 0.0374247404728045 1 1 7 6 ER 0.000134741961247268

Trying to inspect variants distribution between cases and controls by gene I stopped at gene "MAP3K5". Here, pvalue is 0.033. When I table variants distribution I obtain this:

test controls Gene pval
rs1346689102 27 0 MAP3K5 0.03318876

Here, my question is: in this specific case where a variant is present only in cases and none in controls, with a very unbalanced distribution, how can I discard its significance in term of pvalue (given that bonf corrected threshold is 0.0001 and so 0.03 results not significant) giving that distribution? And I get even more doubtful when, trying other combinations of variant filtering I found a bonf-corrected significant pvalue in a set where the three variants where distributed 2 in cases and 1 in controls (data not shown).

I know pvalue in SKATO doesn't have a direction but I am struggling understanding these differences.