How are Singletons treated by the SKAT/Binary-SKAT method?

229 views
Skip to first unread message

jon.kl...@gmail.com

unread,
Apr 23, 2021, 12:47:38 PM4/23/21
to SKAT and MetaSKAT user group

What do the SKAT and Binary-SKAT method do with singletons? Because singletons are not managed well by variance component tests and I have read the some methods will exclude singletons or combine them into a single case/control measure for the given genomic regions (i.e. gene), more akin to a burden test.

I would assume that SKAT-O would probably find that the burden procedure would yield the better p-val in a gene with all/most singletons.

I am particularly interest because, in my data, I have a gene that is significant only when using the Binary-Burden but not the Binary-SKAT method. Looking back at the variants fed to the program, it consists of mostly singletons and a since doubleton. Based on this, I would assume the SKAT and Binary-SKAT method would exclude singletons because, if it combined them into a single case/control measure, the result would be almost identical to
Burden.

zczhao

unread,
Apr 24, 2021, 7:40:44 PM4/24/21
to SKAT and MetaSKAT user group
Hi,

Happy to help. SKAT and burden are different types of tests. Burden aggregates all SNPs in a region first and then performs the association test on this aggregated vector. However, SKAT calculates score statistics at the SNP level first; and then it combines the square of score statistics together to perform an association test. Based on the definition, burden assumes all SNPs in a region have the same direction of effect sizes while SKAT doesn't need. On the other hand, burden tests may have a larger power when the region contains many ultra-rare variants. SKAT-O is a linear combination of SKAT and burden tests, trying to incorporate the advantages of both SKAT and burden. 

Btw, SKAT doesn't exclude singletons in the model. Also, you can use robust SKAT, burden and SKAT-O for analysis, which can adjust for the inflation of type I error rates when case control ratio is unbalanced. The details can be found here:


Thanks,
Zhangchen 

jon.kl...@gmail.com

unread,
Apr 28, 2021, 2:46:24 PM4/28/21
to SKAT and MetaSKAT user group
Zhang Chen,

I was wondering if I could get your thoughts on an approach. I also realized that our past 2 responses were private and not available for the public to see, so I will summarize what we determined earlier.

Recap from earlier:
As alluded to in the "Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies" paper published in AJHG, a user should know that if they are analyzing a gene with a majority of singletons/ultra rare variants with variance component based tests, such as SKAT, they should not expect a signal because the based SKAT methods are calculating SNP specific statistics and then aggregating them In this case, the Burden test is preferred.

In a more detailed description, the SKAT methods are variance tests that, for a binary trait, test against a random distribution of coin flips. For singletons, there are only 2 possible distributions: in the case or in the control. Therefore, for a collection of singletons in a gene, each individual SNP score statistic would be weak and, in-turn, the aggregate score used for p-val calculation would also be weak. This is regardless of how many singletons are present in the gene. Burden tests, on the other hand, are mean based (aggregate the total number of SNPs and test if there is a different in mean number of SNPs between cases and controls) and therefore would be a more appropriate test when looking at a gene or genes with an overwhelming majority of ultra-rare or singleton variants.

New Question:
Given the behavior I described above, I was wondering your thoughts on EXCLUDING singletons (and possibly even doublets) when running analysis using variance component tests such as SKAT, SKAT-Binary, SKAT-Robust. The thought being that these singletons could actually artificially DEFLATE gene level p-values for genes where the input variants are majority ultra rare (i.e singleton, doublet)


Sincerely,

JFK

zczhao

unread,
Apr 28, 2021, 9:55:56 PM4/28/21
to SKAT and MetaSKAT user group
Hi JFK,

I think you are right that singletons could deflate gene-level p-values for certain genes. But it won't affect the p-value too much. On the other hand, people are still curious about the effect of ultra-rare variants. That's the reason why we keep these variants and burden tests can potentially detect their associations.

Currently, we are updating our algorithm by collapsing ultra-rare variants first and run SKAT and burden. It seems that this updated approach can work better in terms of type I error control and new findings. I will let you know when the method is ready to use.

Thanks,
Zhangchen   
Reply all
Reply to author
Forward
0 new messages