SKAT CommonRare

185 views
Skip to first unread message

Rebecca

unread,
Apr 3, 2014, 12:29:41 PM4/3/14
to SKAT...@googlegroups.com
Hi, 

I'm running SKAT on ~30,000 individuals and 3000 gene sets. It takes <1 hour to run using the common and rare function with the default options compared to >12 hours to run SKAT-O with optimal adjust specified. I'm wondering whether this difference is computation time is expected or could something be wrong? 

Thanks!

Rebecca
 

Seunggeun (Shawn) Lee

unread,
Apr 3, 2014, 2:13:57 PM4/3/14
to SKAT...@googlegroups.com
Hi Rebecca,

It is possible that it takes longer than 12 hours considering the large sample size, but it is also possible that there is something wrong. I recommend to run this analysis with a small number of genes (around 100) to check how long it takes.

Thanks,
Shawn

Rebecca

unread,
Apr 10, 2014, 5:03:26 PM4/10/14
to SKAT...@googlegroups.com
Hi Shawn, 

SKAT-O runs quickly for a 100 genes - I guess I was more concerned with the large difference in time taken to run the two different functions and wanted to check whether that was normal. I've also noticed that QQplots of p-values from my common and rare analysis (using either the combined test or the adaptive test) are very inflated compared to SKAT-O. When I run the common and rare function using just the rare variants the QQplot looks good but when I include common variants or use only common variants, the QQplot is very inflated. Could this be due to the weighting of common variants (I'm just using the default weighting)? 

Thanks for your help, 

Rebecca 

Seunggeun (Shawn) Lee

unread,
Apr 11, 2014, 8:07:08 AM4/11/14
to SKAT...@googlegroups.com
Hi Rebecca,

Can you send me your code? I just want to know which option did you use to run SKAT-O and SKAT-Common-Rare. SKAT-O is probably > 10 times slower than SKAT since it runs SKAT several times to get the minimum p-value. If possible, could you send me the QQ plot? Is your trait continuous or binary? Did you adjust for the population stratification? I am not sure why this inflation happens, but I don't think it is due to the weighting. I need more information to figure it out. 

Thanks,
Shawn

Rebecca

unread,
Apr 11, 2014, 3:43:29 PM4/11/14
to SKAT...@googlegroups.com
Hi, 

It's a binary trait and I adjusted for stratification using PCs. I've attached the r-code, QQplots for the common and rare results and SKAT-O for my full dataset. 

Also attached are QQplots from the common and rare function run for rare variants only and common variants only using a subset of gene regions.

The plots are all trimmed based on known variants. 

I've been using a MAF cutoff of 0.01. 

Thanks, 

Rebecca
QQplot_common_rare_fulldataset.pdf
QQplot_SKAT0_fulldataset.pdf
SKAT r-code.txt
QQplot_SET1_common_only.pdf
QQplot_SET1_rare_only.pdf

Colm O'Dushlaine

unread,
Apr 12, 2014, 8:28:20 AM4/12/14
to SKAT...@googlegroups.com
Yeah, I'm seeing something similar. I tested 15k genes with 4PC's and things still look quite inflated. I merged any overlapping genes beforehand, think that might be a factor in the inflation, but it doesn't appear to be. For the original dataset, the QQ plot of SNP statistics seemed fine, so I'm just curious what might be inflating this distribution...
SKAT.qq.pdf

Seunggeun (Shawn) Lee

unread,
Apr 13, 2014, 8:17:21 PM4/13/14
to SKAT...@googlegroups.com
Hi Rebecca,

Thanks for your code and qq plots. I cannot find anything suspicious from them. I also checked the package, and cannot find anything suspicious. Have you carried out single variant test for common variants? Was there any inflation of test statistics? Also how many cases and controls in your data? If the case-control ratio is very unbalanced it is possible to have inflation of test statistics. 

Thanks,
Shawn

Seunggeun (Shawn) Lee

unread,
Apr 13, 2014, 8:25:34 PM4/13/14
to SKAT...@googlegroups.com
Hi Colm,

Thanks for the QQ plot. Do you mean that your single variant tests are fine? If you have a QQ plot of single variant test, can you share it with me? Which test did you use to generate this QQ plot, SKAT, SKAT-O or Common-Rare? What was the sample size of your data? Is the phenotype binary or continuous? These information would be helpful to figure it out why this inflation happens. 

Thanks,
Shawn

Reply all
Reply to author
Forward
0 new messages