SPARQL query for inchikey -> molecular formula

11 views
Skip to first unread message

JoannaW

unread,
Oct 17, 2017, 4:54:08 PM10/17/17
to wikipathways-discuss
Good day! 

First of all, thank you to all contributors for building this extensive wiki that is open source on top of everything :-)

I am building a tool to help interpret untargeted mass spectrometry data, and have various databases set up for compound identification (HMDB, ChEBI, PubChem, KEGG) with the following info:
per compound:
- Compound name
- Description
- Identifier
- Molecular formula
- Formal charge

And am trying to do the same for WikiPathways to see if any of your compounds can be identified in the mass spec results.
I am trying to set up a SPARQL linked data query to fetch this information - my starting point was :

              prefix wp:      <http://vocabularies.wikipathways.org/wp#>
              prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
              prefix dcterms: <http://purl.org/dc/terms/>
              prefix xsd:     <http://www.w3.org/2001/XMLSchema#>

              select distinct ?mb str(?labelLit) as ?label ?pathway
              where {
              ?mb a wp:Metabolite ;
              rdfs:label ?labelLit ;
              dcterms:isPartOf ?pathway .
              ?pathway a wp:Pathway .
              }

to get all metabolites involved in a pathway. this already gives me most of the information I am looking for - however getting the molecular formula and formal charge  of each metabolite is a challenge.
I've tried integrating an experimental query from your faq which does give me the inchikeys for wikidata identifiers, but it times out a lot - plus many of the compounds have no wikidata identifier, but do have an inchikey displayed on your website. Is there any way to get any available inchikeys in a more straightforward way?

Thank you very much, I am still not completely familiar with this type of database so any help is welcome.

Kind regards,

Joanna (PhD student Utrecht)


Egon Willighagen

unread,
Mar 11, 2018, 10:45:54 AM3/11/18
to wikipathways-discuss

Dear Joanna,

I'm really sorry, but I only just now say your email. I'm sure you figured something out by now, but we do not have the chemical formula and charge in the WPRDF, because this is available from other resources, at least some. Thus, a federated SPARQL query is the way forward.



There does not seem to be a field for the charge of the compound... but I will play around a bit and get back on this...

Egon


--
You received this message because you are subscribed to the Google Groups "wikipathways-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikipathways-discuss+unsubscrib...@googlegroups.com.
To post to this group, send email to wikipathways-discuss@googlegroups.com.
Visit this group at https://groups.google.com/group/wikipathways-discuss.
For more options, visit https://groups.google.com/d/optout.



--

Egon Willighagen

unread,
Mar 16, 2018, 9:56:06 AM3/16/18
to wikipathways-discuss

Dear Joanna,


On Sunday, 11 March 2018 15:45:54 UTC+1, Egon Willighagen wrote:
There does not seem to be a field for the charge of the compound... but I will play around a bit and get back on this...

OK, so Wikidata does not have a field for the formal charge...

But this information is available from the InChIKey, so here goes:


The trick is to convert the last InChIKey bit to a formal charge... I'm a bit cheating there, as it actually represents the number of protons more or less, see https://www.inchi-trust.org/technical-faq/#13.1

This is what the "bind" part is doing:

  bind(
    if(substr(?inchikey,27) = "N", "0"^^xsd:integer, 
      if(substr(?inchikey,27) = "M", "-1"^^xsd:integer, 
        if(substr(?inchikey,27) = "O", "+1"^^xsd:integer,
          if(substr(?inchikey,27) = "L", "-2"^^xsd:integer,
            if(substr(?inchikey,27) = "P", "+2"^^xsd:integer,
              if(substr(?inchikey,27) = "K", "-3"^^xsd:integer,
                if(substr(?inchikey,27) = "J", "-4"^^xsd:integer,
                  if(substr(?inchikey,27) = "I", "-5"^^xsd:integer,
                    if(substr(?inchikey,27) = "F", "-8"^^xsd:integer,
                      "999"^^xsd:integer # ERROR
                    )
                  )
                )
              )
            )
          )
        )
      )
    )
    as ?charge
  )


It probably goes well for most things, and eager to learn about exceptions...

Greetings,

Egon
Reply all
Reply to author
Forward
0 new messages