Duplicate DICOM Terms found

9 views
Skip to first unread message

Nolan Nichols

unread,
Jun 13, 2014, 11:03:19 AM6/13/14
to neur...@googlegroups.com
Hi All,

I'm working with Karl Helmer to extract DICOM terms from Neurolex and came across a few duplicates.

For example, "Z Offset in Slide Coordinate System" has two URIs - see this Gist for the details: https://gist.github.com/nicholsn/0e9e97f43817dd5d2523

Karl thinks thinks this may have happened during the original import of terms from a spreadsheet. There appear to be 141 of these where the label contains the neurolex id, as opposed to the human readable label.

You can find a list of the terms here: https://gist.github.com/nicholsn/17861e63fd3432790503

Would it be possible to have these looked at and removed.

Cheers,

Nolan

anita bandrowski

unread,
Jun 13, 2014, 1:06:40 PM6/13/14
to neur...@googlegroups.com
Dear Nolan,
Yes, there is an option to remove the term / page in NeuroLex itself.
Just log in, go to the page and click on the more (next to the edit drop down).

If you are logged in you should have a delete page option.
If this does not appear for you then I will need to give you additional permissions, so just send me your user name.

Best,
anita



--
You received this message because you are subscribed to the Google Groups "neurolex" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neurolex+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Anita Bandrowski, Ph.D.
NIF Project Lead
UCSD 858-822-3629
http://neuinfo.org
http://orcid.org/0000-0002-5497-0243
9500 Gillman Dr.#0446
la Jolla, CA 92093-0608

Nolan Nichols

unread,
Jul 1, 2014, 7:17:17 PM7/1/14
to neur...@googlegroups.com, aband...@ucsd.edu
Hi Anita,

Is there a way to do this programmatically via the api?

Cheers,

Nolan

anita bandrowski

unread,
Jul 1, 2014, 7:36:40 PM7/1/14
to neur...@googlegroups.com, Nolan Nichols
Hi Nolan,
I am not aware of a bulk delete option, perhaps Stephen can comment.
If such an option existed, I imagine it would be well hidden for obvious reasons.

However, if you want to send me a list of 'bad' terms, I have a few student curators that I could task with this.
Best,
anita

Chris Mungall

unread,
Jul 1, 2014, 7:45:22 PM7/1/14
to neur...@googlegroups.com, aband...@ucsd.edu
Hi Nolan,

I'm no expert on neurolex - I don't understand the mapping to RDF.
However, before deleting anything you might want to check that these are
truly duplicates.

These appear to be aliases for the same underlying database entry:
http://neurolex.org/wiki/Nlx_137244
http://neurolex.org/wiki/Category:Z_offset_in_Slide_Coordinate_System

Click on 'history' to see what I mean.

You are indeed seeing two separate URIs in the RDF dump of neurolex.
This appears to be a 'feature' of neurolex (or SMWs in general?). Two
(or more) URIs are generated for each entry: one with a URI fragment
that starts "Category-3A" another that uses a Nlx or Birnlex Id. These
are connected via an owl:sameAs axiom.

I have encountered this when attempting to use the RDF dump or SPARQL
interface to perform operations over the content of neurolex. Personally
I find it easier to first merge over owl:sameAs before attempting
further processing. I would prefer it if neurolex provided a native form
of RDF that only used a single URI for each concept.

Neurolex gurus, let me know if I've misunderstood something.

Here's another example, two URIs for the cerebellum concept:

<owl:Class rdf:about="&wiki;Category-3ACerebellum">
<rdfs:label>Cerebellum</rdfs:label>
<swivt:page rdf:resource="&wikiurl;Category:Cerebellum"/>
<rdfs:isDefinedBy
rdf:resource="&wikiurl;Special:ExportRDF/Category:Cerebellum"/>
<property:Authors
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Joseph
Altman</property:Authors>
<property:Authors
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Shirley Ann
Bayer</property:Authors>
<property:Created
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2006-07-15T00:00:00</property:Created>
<property:CurationStatus
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">uncurated</property:CurationStatus>
<property:Definition
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Part of the
rhombencephalon that lies in the posterior cranial fossa behind the
brain stem, consisting of the cerebellar cortex, deep cerebellar nuclei
and cerebellar white matter.
A portion of the brain that helps regulate posture, balance, and
coordination. (NIDA Media Guide Glossary)</property:Definition>
<property:ISBN
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0849394902</property:ISBN>
<property:Id
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">birnlex_1489</property:Id>


<swivt:Subject rdf:about="&wiki;Birnlex_1489">
<rdfs:label>Birnlex 1489</rdfs:label>
<swivt:page rdf:resource="&wikiurl;Birnlex_1489"/>
<rdfs:isDefinedBy
rdf:resource="&wikiurl;Special:ExportRDF/Birnlex_1489"/>
<rdf:type rdf:resource="&wiki;Category-3ARegional_part_of_brain"/>
<property:Authors
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Joseph
Altman</property:Authors>
<property:Authors
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Shirley Ann
Bayer</property:Authors>
<property:Created
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2006-07-15T00:00:00</property:Created>
<property:CurationStatus
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">uncurated</property:CurationStatus>
<property:Definition
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Part of the
rhombencephalon that lies in the posterior cranial fossa behind the
brain stem, consisting of the cerebellar cortex, deep cerebellar nuclei
and cerebellar white matter.
A portion of the brain that helps regulate posture, balance, and
coordination. (NIDA Media Guide Glossary)</property:Definition>
<property:ISBN
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0849394902</property:ISBN>
<property:Id
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">birnlex_1489</property:Id>
<property:Is_part_of rdf:resource="&wiki;Category-3AHindbrain"/>
<property:Label
rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Cerebellum</property:Label>
<owl:sameAs rdf:resource="&wiki;Category-3ACerebellum"/>

>>> email to neurolex+u...@googlegroups.com <javascript:>.

Nolan Nichols

unread,
Jul 1, 2014, 8:16:34 PM7/1/14
to anita bandrowski, neur...@googlegroups.com
Thanks, Anita.

If you have students that are willing to curate, that would be great - and less risky that running a script =)

Here is a list of all the duplicates: https://gist.github.com/nicholsn/17861e63fd3432790503

Let me know if you need anything else.

Cheers,

Nolan

anita bandrowski

unread,
Jul 1, 2014, 8:33:47 PM7/1/14
to neur...@googlegroups.com, Nolan Nichols
So I have looked over several of these and I can't figure out how they are duplicated.

For the first one, I see that the http://neurolex.org/wiki/Nlx_137244 is redirecting into http://neurolex.org/wiki/Nlx_151276. For this case, these are two terms consolidated into one (perhaps as you say one import vs the next). This redirect code also had a bug in it that I fixed, which made an italic id show up on the page. The redirects all need to start with a colon before calling the class name and some got in with this error. I am more than happy to talk a lot more about this, but it is unlikely to be the culprit.


For the others, I have looked at these so far, there is no other identifier that I see that points to them.
http://neurolex.org/wiki/Nlx_150464
http://neurolex.org/wiki/Nlx_150472
http://neurolex.org/wiki/Nlx_150494
http://neurolex.org/wiki/Nlx_150586
There is also only one search result for the dicom id
http://neurolex.org/w/index.php?title=Special%3ASearch&search=DICOM%3A0038_0500&fulltext=Search

Can you point me to which duplicates you are seeing??


Nolan Nichols

unread,
Jul 1, 2014, 9:01:50 PM7/1/14
to anita bandrowski, neur...@googlegroups.com
Ah, I'm starting to see what might be going on here, but haven't quite put my finger on it. 

I'm working off of the RDF export​ of neurolex, so it might be more of an issue with how I'm querying the dataset rather than how it is handled in the neurolex website, which seems to be correct.


I simply found that in the RDF export, that there are sometimes (141 times) dicom terms with two URI forms (https://gist.github.com/nicholsn/0e9e97f43817dd5d2523), and the URI using "Nlx_XXX" has the rdfs:label of "Nlx XXX", rather than the actual DICOM label.

Does that make sense? 

Cheers,

Nolan


Chris Mungall

unread,
Jul 1, 2014, 10:35:15 PM7/1/14
to neur...@googlegroups.com, anita bandrowski
Hi Nolan - this is indeed the case. The RDF export includes (at least)
two URIs for each neurolex page. This appears to be true for all
concepts, not just DICOM. See my previous email for details.

Nolan Nichols

unread,
Jul 2, 2014, 12:11:11 AM7/2/14
to neur...@googlegroups.com, aband...@ucsd.edu
Thanks, Chris. I missed your previous email. Those details were quite helpful.

Let's clearly not delete any of these, as they are not duplicates.

I'm still a bit puzzled why I didn't get two URIs for every dicom term, but perhaps it boils down to understanding how SMW produces RDF.

Cheers,

Nolan

anita bandrowski

unread,
Jul 2, 2014, 12:16:28 AM7/2/14
to neur...@googlegroups.com

That is odd, I will dig into this in the morning. It must be some weird glitch with how the data went in.

Trish Whetzel

unread,
Jul 2, 2014, 12:52:34 PM7/2/14
to neur...@googlegroups.com
Anita, is there documentation and/or a pointer to the code used to enter this data?

Trish

anita bandrowski

unread,
Jul 2, 2014, 1:24:42 PM7/2/14
to neur...@googlegroups.com, Zaid Aziz
When it was first uploaded we used the off the shelf uploader extension from SMW.
Zaid can comment on the extension documentation.

Reply all
Reply to author
Forward
0 new messages