How to obtain the file of .SetID

650 views
Skip to first unread message

xiangboy...@gmail.com

unread,
Aug 18, 2013, 11:24:42 PM8/18/13
to SKAT...@googlegroups.com
Every body,

I am using the SKAT software to analysis my data (Whole-genome sequence) for association, and did not know how to obtain the file .SetID (using the plink formated data files), because I have a number of SNPs, and did not know how to corresponding gene for them, sincerely hope you can help me!

Thank you very much!


Best wishes!

Bo Xiang
State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
The Mental Health Center and the Psychiatric Laboratory, West China HospitalSichuan University, Chengdu, Sichuan 610041, China 

Seunggeun (Shawn) Lee

unread,
Aug 19, 2013, 9:44:59 AM8/19/13
to SKAT...@googlegroups.com
Hi Bo,

SKAT package doesn't provide functions for annotation, but there are several software for it. I used ANNOVAR software several times (http://www.openbioinformatics.org/annovar/), and it worked quite well. 

Thanks,
Shawn

Mary Same

unread,
Oct 27, 2014, 1:18:28 PM10/27/14
to SKAT...@googlegroups.com
Hi,

I'm using a vcf file from 1000 genomes to run SKAT; I converted my file to plink format and got the .bed, .fam, and .bim files but I also don't see how to make the setID.  I went to the ANNOVAR website but I could not find where it would be able to help me solve this problem--does the set ID correspond to the ID col (col 3) and the snp ID correspond to the ref allele col (col 5) in the VCF file? If so, I assume there would be a way to parse the VCF file (although I have been unable to open the vcf file and get a readable format) but I'm not sure if this is was the SKAT manual is referring to when they say snpID and setID.

Thanks!
Mary

Seunggeun (Shawn) Lee

unread,
Oct 28, 2014, 4:12:51 PM10/28/14
to SKAT...@googlegroups.com
Hi Mary,

If you want to do gene-based analysis (so you want to make SNP-sets based on gene), you need to figure out first which SNPs belong to which genes. 1000 genome VCF file does not have this information. The .variant_function output file from ANNOVAR package will give you this information, and then you can create SetID file using it. 

There exist some R-packages (ex. SeqMiner, http://zhanxw.com/seqminer/) that can read VCF file and make genotype matrices (gene-based). You also can use it with SKAT package. 

Thanks,
Shawn
Reply all
Reply to author
Forward
Message has been deleted
0 new messages