Clarification on Hallmark Gene Set Gene Counts

13 views
Skip to first unread message

Joy E

unread,
Jul 2, 2025, 1:41:53 PMJul 2
to gsea-help

**apologies if this is a duplicate, could not see my message after I posted**

Good morning GSEA Team,

I hope this message finds you well.

I’m currently using the Hallmark gene sets in my analyses and would like to confirm the most up-to-date gene counts for accuracy. In the v2025 file I downloaded, I found a total of 7322 genes across all Hallmark sets, with 4384 unique genes. However, according to the original 2016 paper, Table 1 lists a total of 7343 genes when summing across the sets.

Could you please confirm the current official total and unique gene counts for the Hallmark collection?

Kind regards,
Joy

David Eby

unread,
Jul 3, 2025, 7:21:57 PMJul 3
to gsea...@googlegroups.com
Hi Joy,

Those counts you give are correct.  On each release of MSigDB, we remap all of the source members from the original gene sets using symbol mappings based on the current version of Ensembl.  This will result in slight variation in the total numbers over time but will always track the latest (at the time of release) version of a significant public genome assembly.

We also no longer (since MSigDB 7.0) map any source members from UniGene due to that resource being retired, which may also affect the total numbers.

Regards,
David


--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gsea-help/7b278c00-95c0-4339-a183-7dfd27ed9364n%40googlegroups.com.

Joy E

unread,
Jul 8, 2025, 1:08:19 PMJul 8
to gsea-help
Hi David,

Thank you for your message and clearing this up for me.

Kind regards,
Joy

Reply all
Reply to author
Forward
0 new messages