Using SKAT-O for gene set analysis - memory allocation problem

79 views
Skip to first unread message

Au F

unread,
Jul 13, 2017, 6:44:48 AM7/13/17
to SKAT and MetaSKAT user group
Hi everyone,

I am quite new to SKAT analysis, so apologise in advance if I say incorrect things.
I am trying to run SKAT-O on a gene set, looking for association with a binary phenotype.
Data seems to be correctly loaded into R; .log file:

Check duplicated SNPs in each SNP set
No duplicate
Warning: 94977 SNPs in the SetID file were not found in Bim file!
Please check GeneSet.SSD_LOG.txt file!
8027 Samples, 1 Sets, 164558 Total SNPs
[1] "SSD and Info files are created!"
8027 Samples, 1 Sets, 164558 Total SNPs
Open the SSD file

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
Close the opened SSD file: /home/c1034488/SKAT/GeneSet.SSD

However, SKAT-O analysis cannot be performed and this is the error message I obtain

Error : cannot allocate vector of size 146.0 Gb
In addition: Warning messages:
1: 24583 SNPs with either high missing rates or no-variation are excluded! 
2: The missing genotype rate is 0.005433. Imputation is applied. 
Warning message:
Error to run SKAT for NameGeneSet: Error : cannot allocate vector of size 146.0 Gb

Is there any way I can avoid this issue or just my gene set contains too many SNPs? Is there any alternative test I could use to SKAT-O (I am not aware if my SNPs are casual variants and whether their association to phenotype goes in one direction or the other)? I have seen there is a SKATBinary test but I am not sure if it can be used as an alternative to SKAT-O in case of a binary phenotype.

Many thanks

Aura 

Seunggeun (Shawn) Lee

unread,
Jul 21, 2017, 10:03:04 AM7/21/17
to SKAT and MetaSKAT user group
Hi

This is due to the fact that the number of SNPs is very large (164558). Currently SKAT-O cannot handle that many SNPs. Also when there are large numbers of SNPs, burden test, so SKAT-O (which combine SKAT and burden test), cannot be a good choice, because the majority of these variants would be non-causal variants. I recommend using SKAT when the number of variants is too large. 

Thanks,
Shawn

Reply all
Reply to author
Forward
0 new messages