Representing Counts of Entities in Phenotypes

2 views
Skip to first unread message

Cartik

unread,
Jun 24, 2009, 4:29:18 PM6/24/09
to obd-dev
Hi,

At present, we have started loading counts of entities into the
Phenoscape project. For example, the statement "humans have 33
vertebral bones" can be represented in OBD syntax, with post composed
phenotypes as shown below

Homo sapiens exhibit count^inheres_in(vertebral bone)
^has_count(33). -- [1]

The RHS of the 'exhibits' relation is the phenotype, which is mapped
to the components as shown below

count^inheres_in(vertebral bones)^has_count(33)
is_a count --[2]

count^inheres_in(vertebral bones)^has_count(33)
inheres_in vertebral bone --[3]

count^inheres_in(vertebral bones)^has_count(33).
has_count 33 --[4]

Currently, we use link statements to model the "numerical count",
meaning the integer "33" in [4] is treated as a node, when intuitively
it should be a simple string literal. Let us consider the alternative,
i.e. using a literal statement. This would mean, the statement in [1]
would have to be rewritten as

Homo sapiens exhibit count^inheres_in(vertebral
bones) --[5]

Then, we would have to use a literal statement around this statement
to keep track of the "count" as shown below

[5] has_reiflink_node_id
<generated-hex-id> --[6]

<generated-hex-id> is the reification node, which points to
publications and other metadata about [5].

We can hang the string literal '33' off this reification node as shown
below.

<generated-hex-id> has_count
"33" --[7].

Again, [5] ~ [7] seems to be a rather round about way to say "humans
have 33 vertebral bones." Is there a better way of doing this? Any
ideas?

Cheers,

Cartik


Chris Mungall

unread,
Jun 24, 2009, 7:06:57 PM6/24/09
to obd...@googlegroups.com

Yet another way to do it is to use cardinality restrictions, as in
OWL. OBO also allows this, but the plumbing hasn't been added yet to
store the cardinality in OBD. Also you can't use cardinality in the
convenient obo ID class expression syntax.

The schemes below differ in that they put the quantification into the
domain of discourse, using a count quality. This allows you to do
things such as make relative statements such as fooA has more bones
than fooB.

Note you should use has_number_of
http://www.bioontology.org/wiki/index.php/PATO:Revised_2008#Absence_and_counting

The best course of action depends on what kind of statements you want
to make and what kinds of questions you want to ask. Some considerations

- direct cardinality restrictions can sometimes cause reasoners to
flounder if the numbers are high
- it's harder to make statements about numbers using direct
cardinality restrictions; e.g. geneX affects the number of vertebrae
- there are of course many kinds of reasoning you can do with direct
cardinality restrictions; e.g automatically classify humans under
"animals with 10-50 vertebrae"
- * for this kind of reasoning you may want an open world reasoner,
rather than the OBD one
- using a quality allows you to treat the cardinality like any other
quality, and uniformity often makes things easier

This isn't directly answering your question. I have to think a bit
about the second scheme you propose. The first scheme has the
advantage of being the simplest to implement, because as you say you
just treat the numbers as nodes. It's slightly awkward. There is
always a syntactic translation to either (a) direct cardinality
restrictions or (b) a more efficient DBMS representation, in which
integer datatypes are used at the relational level. I would make some
recommendations such as using rdf datatype syntax for the number. But
I need to think a bit more and hear more of other requirements before
I give a specific recommendation.
Reply all
Reply to author
Forward
0 new messages