MSigDB XML creation from SQL using R

22 views
Skip to first unread message

Aldo Jongejan

unread,
May 20, 2025, 7:42:12 AMMay 20
to gsea-help
Dear all,

I am trying to recreate the XML file used for MSigDB v2024 from the SQL database (see https://docs.gsea-msigdb.org/#MSigDB/MSigDB_SQLite_Database/) using R.
However, I stumbled upon some errors, but also noted using a simple query, that the SQL DB does not seem to hold all columns one expects:

library("RSQLite")
MSigDB = dbConnect(RSQLite::SQLite(), "msigdb_v2024.1.Hs.db")
SQLStr <- "SELECT standard_name, systematic_name, species_name, PMID
FROM gene_set"
a <- as.data.frame(dbGetQuery(MSigDB, SQLStr))

Error: no such column: systematic_name

What am I missing ?

with kind regards,
Aldo Jongejan 

David Eby

unread,
May 20, 2025, 6:21:03 PMMay 20
to gsea...@googlegroups.com
Hi Aldo,

That SQL query does not match the MSigDB schema.  The diagram on that page might help.

The columns that you've indicated are spread across multiple tables.  Here's an alternative query that does what it appears you are trying to do:
SELECT standard_name, systematic_name, source_species_code, pub.PMID 
FROM gene_set gset 
INNER JOIN gene_set_details gsd on gset.id = gsd.gene_set_id
INNER JOIN publication pub on gsd.publication_id = pub.id;

If you only want to recreate the XML, there is a full set of queries near the bottom of that page to do so.  That might be a better starting point for building a custom query.

Regards,
David



--
You received this message because you are subscribed to the Google Groups "gsea-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsea-help+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gsea-help/2fab48eb-c11c-494e-9541-df1a4600aa2an%40googlegroups.com.

Aldo Jongejan

unread,
May 21, 2025, 9:21:27 AMMay 21
to gsea-help
Dear David,

Thanks for your suggestions!
I have been able to sove my problem and get it working in R as I intended to. I first tried to use the SQL code as provided by MSIgDB (https://docs.gsea-msigdb.org/#MSigDB/MSigDB_SQLite_Database/#building-the-legacy-msigdb-xml-file), but that gave me some errors. Separating the two 'SELECT' statements succesfully gave me the XML header and the 'BODY' (i.e all the geneset information). Using the '.print' statements for header and footer information I was able to get the Legacy XML file for v2024.

Taking the complete SQL string crashed my R session and I also had to be carefull escaping the quotes, but it the end it now works!!

Thanks!
Aldo

Op woensdag 21 mei 2025 om 00:21:03 UTC+2 schreef David Eby:

David Eby

unread,
May 21, 2025, 7:04:25 PMMay 21
to gsea...@googlegroups.com
Hi Aldo,

Thanks for pointing this out.  It turns out that there was an extra stray double-quote around the CONTRIBUTOR field, which would have caused some issues in the resulting XML.

The example was written with the official SQLite client in mind, so maybe the RSQLite client is more sensitive.  But I'm glad you were able to get it to work!


Reply all
Reply to author
Forward
0 new messages