Some IRefIndex geneid links are incorrect and should be to the symbol namespace

30 views
Skip to first unread message

Peter Ansell

unread,
Jul 4, 2012, 9:55:39 PM7/4/12
to Bio2Rdf Mailing List
Some triples in the irefindex namespace are being generated to the
geneid namespace when they should be pointing at the symbol namespace.
The predicates in question are [1] and [2], for example on the uri
[3]. In the example, http://bio2rdf.org/geneid:MAGEA4 should be
http://bio2rdf.org/symbol:MAGEA4

The rdfiser script [4] assumes that irefindex will always have the
following mapping 'entrez gene/locuslink' => 'geneid', and that works
for the other predicates, but the irefindex database fails to
distinguish the numeric identifiers from the textual symbols that we
put in "symbol:", which is based mostly on HGNC.

I can write a sparql rule to fix the error dynamically in the webapp
if necessary, but it would be best to fix it in the rdfiser so that
users of the sparql endpoint are not confused about the difference
between the endpoint and the webapp.

Peter

[1] http://bio2rdf.org/irefindex_vocabulary:aliasa
[2] http://bio2rdf.org/irefindex_vocabulary:aliasb
[3] http://bio2rdf.org/irefindex:739540
[4] https://github.com/bio2rdf/bio2rdf-scripts/blob/master/irefindex/irefindex.php

Michel Dumontier

unread,
Jul 5, 2012, 4:38:55 PM7/5/12
to bio...@googlegroups.com
so, are we using the "symbol" or "hgnc" namespace?

m.


--
You received this message because you are subscribed to the Google Groups "bio2rdf" group.
To post to this group, send email to bio...@googlegroups.com.
To unsubscribe from this group, send email to bio2rdf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bio2rdf?hl=en.




--
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group

Peter Ansell

unread,
Jul 5, 2012, 4:43:33 PM7/5/12
to bio...@googlegroups.com
hgnc contains the numeric identifiers from hgnc and symbol has
contained the textual gene symbols.

Peter

Michel Dumontier

unread,
Jul 5, 2012, 5:10:43 PM7/5/12
to bio...@googlegroups.com
got it. ok, i will make the change.

m.

Michel Dumontier

unread,
Jul 6, 2012, 10:02:33 AM7/6/12
to bio...@googlegroups.com
Paul, what does MI:1095 correspond to?


m.

On Fri, Jul 6, 2012 at 7:48 AM, Paul Boddie <paul....@biotek.uio.no> wrote:
On Thursday, 5 July 2012 03:55:39 UTC+2, Peter Ansell wrote:
Some triples in the irefindex namespace are being generated to the
geneid namespace when they should be pointing at the symbol namespace.
The predicates in question are [1] and [2], for example on the uri
[3]. In the example, http://bio2rdf.org/geneid:MAGEA4 should be
http://bio2rdf.org/symbol:MAGEA4


I have to admit that I'm not following RDF-based work and I'm therefore not familiar with the various namespaces in use, but as someone currently responsible for the iRefIndex data, should I make sure that gene symbols use the hgnc prefix corresponding to the MI:1095 controlled vocabulary term in the PSI-MI TAB output?

Paul

--
You received this message because you are subscribed to the Google Groups "bio2rdf" group.
To view this discussion on the web visit https://groups.google.com/d/msg/bio2rdf/-/C9tv4bpaZU4J.

To post to this group, send email to bio...@googlegroups.com.
To unsubscribe from this group, send email to bio2rdf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/bio2rdf?hl=en.

Paul Boddie

unread,
Jul 6, 2012, 10:12:59 AM7/6/12
to bio...@googlegroups.com
On 06/07/12 16:02, Michel Dumontier wrote:
> Paul, what does MI:1095 correspond to?
>
> http://bioportal.bioontology.org/ontologies/39508/?p=terms&conceptid=MI:1095

I was thinking of this:

http://www.ebi.ac.uk/ontology-lookup/?termId=MI%3A1095

In the data we produce, that term won't actually get used, but the
symbols would be prefixed with hgnc as the "database name"
(corresponding to the "preferred name" in the ontology term entry) as
described in the PSI-MI TAB specification:

https://code.google.com/p/psicquic/wiki/MITAB26Format

Paul

Michel Dumontier

unread,
Jul 6, 2012, 11:54:59 AM7/6/12
to bio...@googlegroups.com
i think prefixing these terms with hgnc would be very useful!

m.



Paul

--
You received this message because you are subscribed to the Google Groups "bio2rdf" group.
To post to this group, send email to bio...@googlegroups.com.
To unsubscribe from this group, send email to bio2rdf+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/bio2rdf?hl=en.

Reply all
Reply to author
Forward
0 new messages