---------- Forwarded message ----------
From: Alan Ruttenberg <alanrut...@gmail.com>
Date: Wed, Jan 21, 2009 at 2:38 AM
Subject: Re: Still working on performance
To: Melanie Courtot <mcou...@gmail.com>
Cc: Bjoern Peters <bpe...@liai.org>, Ian Horrocks
<ian.ho...@comlab.ox.ac.uk>, Jonathan Rees
<j...@creativecommons.org>
Here's an outline of the debugging process as it exists now. This note
is mostly for the purposes of capturing documentation, enabling
Melanie to do some of this, to see whether Ian has further suggestions
of how to proceed, and for Jonathan's amusement.
Step 1: Load the kb and collect information. The take-sample function
takes a kb, a number of seconds for the reasoner to run, and the sort
key for presenting the results (one of "Time", "depth" (of the tree) ,
"size" (number of nodes)).
(setq kb (load-kb-jena :obil))
(take-sample kb 60 "Time")
It runs and prints out a sorted list of the classes that it had time
to check. Here is what it looks like now:
((("http://purl.obofoundry.org/obo/OBI_0000153"
("cell co-culturing"))
("depth" "18") ("size" "11882") ("Time" "7564"))
(("http://purl.obofoundry.org/obo/OBI_9999994"
("chromium release assay"))
("depth" "18") ("size" "10502") ("Time" "5538"))
(("http://purl.obofoundry.org/obo/OBI_0302851" ("fixed"))
("depth" "19") ("size" "8352") ("Time" "4025"))
(("http://purl.obofoundry.org/obo/OBI_0600021" ("cell fixation"))
("depth" "17") ("size" "7890") ("Time" "2344"))
(("http://purl.obofoundry.org/obo/OBI_0000264"
("sample population"))
("depth" "16") ("size" "5321") ("Time" "2080"))
(("http://msi-ontology.sourceforge.net/ontology/CHROM.owl#msi_01082"
("evaporative light scattering detector"))
("depth" "13") ("size" "1296") ("Time" "2051"))
(("http://purl.obofoundry.org/obo/OBI_0100064"
("screening library"))
("depth" "16") ("size" "4587") ("Time" "1603"))
(("http://purl.obofoundry.org/obo/OBI_0400023" ("CYFlow ML"))
("depth" "14") ("size" "6485") ("Time" "1542"))
(("http://purl.obofoundry.org/obo/OBI_0400005" ("A10-Analyzer"))
("depth" "14") ("size" "6485") ("Time" "1536"))
(("http://purl.obofoundry.org/obo/OBI_0400014" ("BioSorter1000"))
("depth" "14") ("size" "6493") ("Time" "1506"))
(("http://purl.obofoundry.org/obo/OBI_0400041" ("FACSvantage"))
("depth" "14") ("size" "6493") ("Time" "1493"))
(("http://purl.obofoundry.org/obo/OBI_0400055" ("inFlux Analyzer"))
("depth" "14") ("size" "6485") ("Time" "1484"))
....
You then pick off some culprit near the top, compute the tree, and
have a look at it.
(setq a (check-entity-consistency !obi:OBI_0302851 kb t))
INFO [Thread-4] (ABox.java:1541) - Consistency
http://purl.obofoundry.org/obo/OBI_0302851 for 0 individuals []
INFO [Thread-4] (ABox.java:1628) - Consistent: true Tree depth: 19
Tree size: 8352 Time: 2892
#<completion for consistency check of 'fixed' !obi:OBI_0302851 8352 nodes>
The function:
(make-completion-graph a 5)
Draws a radial layout tree to depth 5. I find between 3 and 6 to be
tolerable values depending on how cluttered the result is. Attached is
an image of what this looks like.
Each node is formatted as
"label:
class
class
class
..."
The label is either "Class:" for the focus class, or the name of the
relation of the parent individual to the node. The classes are a list
of the most specific named classes based on looking only at asserted
superclass relations.
As you mouse over, the node under your mouse is red, direct parents
and children are yellow. Clicking on node brings up an inspector that
lists all the class expressions that are assigned to the node. There's
a picture of this attached as well.
You then look around for nonsense. Some nonsense are relations to
types of individuals that don't make sense in the context. For
instance, in the image I attached, it doesn't make sense that we are
seeing something about specified output data when we are describing a
quality. The focus class is "fixed" and it's definition is
quality_of some ('material entity' and is_specified_output_of some
'cell fixation')
So the first two nodes out from "fixed" are first an instance of
quality, and second that cell fixation process. The fact that from
that cell fixation process there is has_specified_output_data: datum
means that we haven't said that processes such as cell fixation don't
output data.
Another pattern is one where we see a property and then it's inverse
immediately asserted (and again perhaps). Picture 21, attached, shows
such a case of dueling is_specified_output_of/has_specified_output.
Poking around I think this is due to artifact object:
is_specified_output_of some 'artifact material creation' and
artifact material creation: has_specified_output some 'artifact object'.
If we get rid of the axiom on artifact material creation and reload, we get:
(setq a (check-entity-consistency !obi:OBI_0302851 kb t))
#<completion for consistency check of 'fixed' !obi:OBI_0302851 5980 nodes>
The tree is 1/3 smaller. Such a change will typically affect many
nodes, so you might see a surprising increase in reasoning speed for a
small change such as this.
Thus far the sorts of things I have been noticing are missing
disjoints, incorrectly broad ranges for properties, and these uses of
properties/inverse where we might be able to choose one direction and
stick with it.
In the shortest term, I think adding disjoints among the objective
specifications might help. I think I made those changes and saw effect
but I've been messing around with the files enough that I don't want
to save them and I don't have a good diff tool handy.
-Alan