How abt could_be relation?

Sridevi Polavaram

unread,

Jul 27, 2010, 5:05:32 PM7/27/10

to neur...@googlegroups.com

Hi Everybody,
I am Sridevi Polavaram from George Mason University, Fairfax VA, one
of the sub contracting sites of NIF project. I am a PhD student in Dr.
Ascoli's lab, I am working on ontologies, in relevance to
NeuroMorpho.Org, NIF, and Neuron registry projects. For NeuroMorpho.Org,
we are trying to link the metadata to ontologies and we started with
species. For the species ontology that we are customizing for
NeuroMorpho.Org, I want to use a "could be" relationship to link all
possible child nodes to "Not Reported mice strains" class. So, my
question for the group is, do we have a "could be" relationship in any
of the existing Neuroscience related ontologies, just as "is a" and "has
a". If not, defining such a relation would be very useful in mining for
data that we cannot explicitly instantiate under a specific class in the
ontology, but would still like to have it as a probable match for the
search. e.g., if a user queries for "C57BL/6J", we would like to return
as "possible hits" data for "Not reported mice strains", because it
"could be" C57BL/6J. I would appreciate if you have any thoughts about
this.
Thanks
Sridevi

Maryann Martone

unread,

Jul 29, 2010, 9:01:48 PM7/29/10

to neur...@googlegroups.com

Hi Sridevi:

I don't think that "could be" is going to be very useful and is not likely to be in an ontology. There are over 800 strains of rats, for example, so that's a lot of "could be" statements. I also don't think that you really want to mine data that "could be" a Wistar rat because that's just going to add a lot of noise. But if you really want to return all members of the class Wistar or "Rat not otherwise specified", it sounds like you can do that easily in Neuromorpho by having a category in a pull down menu: "Mouse not specified", because you already have that information in your database. I spoke with Alan Ruttenberg about this issue, and he also said it's possible to construct a SPARQL query that says "Find all instances of rat that are not members of any subtype of rat".

Maryann

--
You received this message because you are subscribed to the Google Groups "neurolex" group.
To post to this group, send email to neur...@googlegroups.com.
To unsubscribe from this group, send email to neurolex+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/neurolex?hl=en.

Sridevi Polavaram

unread,

Jul 30, 2010, 3:07:35 PM7/30/10

to neur...@googlegroups.com, Maryann Martone

Hi Maryann:
Thanks for getting back on could_be. Regarding the noise issue.. I
think, the opposite, because could_be can substantially reduce the
technical and mining problems we usually encounter when we import too
many terms from various ontologies. Here is an example in reference to
the attached figure where I have exapanded the rat subtree in specific.
All the highlighted subclasses are could_be sub strains of respective
NotReported parents that are imported from NIFSTD. If you notice, there
is only 'NotReported rats', 'NotReported Wistar rat' and 'NotReported
Sprague dawley rat', but not 'NotReported rattus', which is where we
usually have the big number 800. So, according to this figure which is
based on the 1695 rat neuronal reconstructions identified from 44
papers, we have a pretty simple hierarchy stating that rattus norvegicus
is the most commonly used rat in neuroscience and within this we have
identified at least fischer, long evans, sprague dawley and wistar
strains while the NotReported strains could possibly be from DB1X,
munich wistar and so forth. While this argument is about the
Neuroscience perspective. The Neuroinformatic view sees the advantage of
clearly seperating what is the knowledge that we know and what is the
knowledge that we don't know. Once we have this distinction, I think,
adding an implementation layer on the top as webservices, SPARQL, etc.,
makes it easier and structured for data mining.
Therefore, I see a two-fold usage for could_be and this is true for
other concepts too. If we project a similar example for brain regions,
CA1, CA2, CA3, DG are the known sub classes of Hippocampus and CA4
could_be a 'NotReported Hippocampal regions' according to some XYZ human
hippocampus ontology. Same can be applied for cell types too.
Last but not least, adding the could_be sub classes is relatively cheap
from the technical point of view, because we are just importing them
from well known ontologies like NIFSTD or GO ontology which have a good
coverage of field.
Please let me know if I have overlooked or missed anything important.
Thanks much,
Sridevi

> <mailto:neur...@googlegroups.com>.

> To unsubscribe from this group, send email to
> neurolex+u...@googlegroups.com

> <mailto:neurolex%2Bunsu...@googlegroups.com>.

ratsubtree.png

Alan Ruttenberg

unread,

Jul 30, 2010, 10:53:22 PM7/30/10

to neur...@googlegroups.com, Maryann Martone

A consequence of what you have drawn is
1) that if you ask for notreported rat you will not get Wistar or
Sprague-Dawley rats of any sort.
2) if you ask for Wistar Rat, or NotReportedWistar Rat you will get
the same set of subclasses
3) Finally, every Kyoto Wistar rat is concluded to be a notReportedWistar rat.

None of these is correct.

In the semantic web world, the languages operate using the "open world
assumption" - that what we don't state is not known. So if you state
that an animal is a Rattus norvegicus it *mean* that it could be any
subclass of Rattus norvegicus.

On Fri, Jul 30, 2010 at 12:07 PM, Sridevi Polavaram <spol...@gmu.edu> wrote:
> Hi Maryann:
> Thanks for getting back on could_be. Regarding the noise issue.. I think,
> the opposite, because could_be can substantially reduce the technical and
> mining problems we usually encounter when we import too many terms from
> various ontologies. Here is an example in reference to the attached figure
> where I have exapanded the rat subtree in specific.

I'm not seeing how it works that this reduces problems with having too
many terms. Could you elaborate?

> All the highlighted subclasses are could_be sub strains of respective
> NotReported parents that are imported from NIFSTD. If you notice, there is
> only 'NotReported rats', 'NotReported Wistar rat' and 'NotReported Sprague
> dawley rat', but not 'NotReported rattus', which is where we usually have
> the big number 800. So, according to this figure which is based on the 1695
> rat neuronal reconstructions identified from 44 papers, we have a pretty
> simple hierarchy stating that rattus norvegicus is the most commonly used
> rat in neuroscience and within this we have identified at least fischer,
> long evans, sprague dawley and wistar strains while the NotReported strains
> could possibly be from DB1X, munich wistar and so forth. While this argument
> is about the Neuroscience perspective. The Neuroinformatic view sees the
> advantage of clearly seperating what is the knowledge that we know and what
> is the knowledge that we don't know.

There is a clear separation already. What we don't state, we don't
know. What we state, we know.

-Alan

Lin Yu

unread,

Aug 29, 2010, 10:08:37 PM8/29/10

to neur...@googlegroups.com

Hi, anybody is in Kobe now for the conference?
I'm working in CDB, RIKEN, which is very close to venue.

If there is anything I can help, please let me know.

Best,
Lin

*************************************************
Yu Lin
MB,MSc,PhD

Genome Resource and Analysis Unit,
Genomics Support Unit,
RIKEN Center for Developmental Biology,

TEL&FAX 0081-78-3063048
Mobile: 090-8368-2928
Extension:4232
Email: li...@cdb.riken.jp
*************************************************

Reply all

Reply to author

Forward