Problem with SetID file

17 views
Skip to first unread message

wided boukhalfa

unread,
Jul 17, 2025, 10:44:47 AMJul 17
to SKAT and MetaSKAT user group
Hello,
I am currently working on gene-set analysis using the SKAT package. i tried to generate the setID file several times, but I always get the same error msg: Erreur dans Check_ID_Length(File.SetID): Error in SetID file!
The only solution that I got is to have one SNP per Gene, which will affect my analysis since the SNP-association analysis revealed a significant signal in more than one SNP in a Gene.
Can you please help me fix this problem?
Here is an example of the SetID that I used
"PRDM16 1:3426162          
CASZ1 1:10647844 1:10653964        
MFN2 1:12004078 1:12004874        
PADI4 1:17354648 1:17363724        
PINK1_PINK1-AS 1:20644665          
HSPG2 1:21878194          
PRDX1 1:45519056          
AKR1A1 1:45566639          
PCSK9 1:55056028          
PRKAA2 1:56704281          
PATJ 1:61771558 1:62084608 1:62116563 1:62117175      
SGIP1 1:66682325          
ADGRL2 1:81970395          
DDAH1 1:85351500          
ABCA4 1:94001992 1:94011280        
DPYD_DPYD-AS1 1:97306285  "
Thanks in advance for your help!

Hande çadır

unread,
Dec 20, 2025, 2:04:34 PM (9 days ago) Dec 20
to SKAT and MetaSKAT user group
Hello, I'm not sure how mathematically accurate my analyses are, as this is my first time using SKAT. However, the analysis worked when I included the set ID in the way shown below. I performed the analysis via pathways, not gene-to-gene. Have you considered adding CHRs? I'm including how I obtained the set ID and an example of the set ID below; I hope it helps.
Code 
def write_setid_locus(matrix_table, gene_sets_dict, output_file):
    """
    for each gene set:
    set name -> chr:pos:ref:alt
    """
    rows = matrix_table.rows()

    with open(output_file, "w") as f:
        for set_name, gene_list in gene_sets_dict.items():
            gene_set = set(gene_list)
            filtered = rows.filter(hl.literal(gene_set).contains(rows.gene_symbol))

            loci_tuples = filtered.aggregate(
                hl.agg.collect((filtered.locus.contig, filtered.locus.position,
                                filtered.alleles[0], filtered.alleles[1]))
            )

            for contig, pos, ref, alt in loci_tuples:
                if None not in (contig, pos, ref, alt):
                    f.write(f"{set_name} {contig}:{pos}:{ref}:{alt}\n")


gene_sets = {
    "VEGF": vegf,
    "TGF_BETA": tgf_beta,
    "Cytokine": Cytokine,
    "JAK/STAT": jak_stat,
    "T-Cell": t_cell,
    "Literature_Genes": lit_genes}
write_setid_locus(kegg_genes_mis, gene_sets, "plink/kegg_genes_mis.SetID") 

VEGF chr20:47651082:AGC:A
VEGF chr20:47651085:AGCAGCAACAGCAGCAG:A
VEGF chr20:47651122:A:AGCAG
TGF_BETA chr3:184388996:G:A
TGF_BETA chr5:80474703:C:G
TGF_BETA chr6:7727136:C:T
TGF_BETA chr6:26091105:G:A
17 Temmuz 2025 Perşembe tarihinde saat 17:44:47 UTC+3 itibarıyla wided boukhalfa şunları yazdı:
Reply all
Reply to author
Forward
0 new messages