Problem with SetID file

19 views
Skip to first unread message

wided boukhalfa

unread,
Jul 17, 2025, 10:44:47 AM7/17/25
to SKAT and MetaSKAT user group
Hello,
I am currently working on gene-set analysis using the SKAT package. i tried to generate the setID file several times, but I always get the same error msg: Erreur dans Check_ID_Length(File.SetID): Error in SetID file!
The only solution that I got is to have one SNP per Gene, which will affect my analysis since the SNP-association analysis revealed a significant signal in more than one SNP in a Gene.
Can you please help me fix this problem?
Here is an example of the SetID that I used
"PRDM16 1:3426162          
CASZ1 1:10647844 1:10653964        
MFN2 1:12004078 1:12004874        
PADI4 1:17354648 1:17363724        
PINK1_PINK1-AS 1:20644665          
HSPG2 1:21878194          
PRDX1 1:45519056          
AKR1A1 1:45566639          
PCSK9 1:55056028          
PRKAA2 1:56704281          
PATJ 1:61771558 1:62084608 1:62116563 1:62117175      
SGIP1 1:66682325          
ADGRL2 1:81970395          
DDAH1 1:85351500          
ABCA4 1:94001992 1:94011280        
DPYD_DPYD-AS1 1:97306285  "
Thanks in advance for your help!

Hande çadır

unread,
Dec 20, 2025, 2:04:34 PM12/20/25
to SKAT and MetaSKAT user group
Hello, I'm not sure how mathematically accurate my analyses are, as this is my first time using SKAT. However, the analysis worked when I included the set ID in the way shown below. I performed the analysis via pathways, not gene-to-gene. Have you considered adding CHRs? I'm including how I obtained the set ID and an example of the set ID below; I hope it helps.
Code 
def write_setid_locus(matrix_table, gene_sets_dict, output_file):
    """
    for each gene set:
    set name -> chr:pos:ref:alt
    """
    rows = matrix_table.rows()

    with open(output_file, "w") as f:
        for set_name, gene_list in gene_sets_dict.items():
            gene_set = set(gene_list)
            filtered = rows.filter(hl.literal(gene_set).contains(rows.gene_symbol))

            loci_tuples = filtered.aggregate(
                hl.agg.collect((filtered.locus.contig, filtered.locus.position,
                                filtered.alleles[0], filtered.alleles[1]))
            )

            for contig, pos, ref, alt in loci_tuples:
                if None not in (contig, pos, ref, alt):
                    f.write(f"{set_name} {contig}:{pos}:{ref}:{alt}\n")


gene_sets = {
    "VEGF": vegf,
    "TGF_BETA": tgf_beta,
    "Cytokine": Cytokine,
    "JAK/STAT": jak_stat,
    "T-Cell": t_cell,
    "Literature_Genes": lit_genes}
write_setid_locus(kegg_genes_mis, gene_sets, "plink/kegg_genes_mis.SetID") 

VEGF chr20:47651082:AGC:A
VEGF chr20:47651085:AGCAGCAACAGCAGCAG:A
VEGF chr20:47651122:A:AGCAG
TGF_BETA chr3:184388996:G:A
TGF_BETA chr5:80474703:C:G
TGF_BETA chr6:7727136:C:T
TGF_BETA chr6:26091105:G:A
17 Temmuz 2025 Perşembe tarihinde saat 17:44:47 UTC+3 itibarıyla wided boukhalfa şunları yazdı:
Reply all
Reply to author
Forward
0 new messages