Scalability issues with the OBD reasoner

2 views
Skip to first unread message

Cartik

unread,
Jul 21, 2009, 10:04:23 AM7/21/09
to obd-dev
Hi,

I'm glad to announce the Phenoscape DB has hit the 3.5 GB mark, which
takes it clearly out of the realm of toy databases. However, we are
having serious issues with running the OBD reasoner, actually the OBD
reasoner with our additions to it. Please refer to [1] for a summary
of the inferences, which includes those from the original OBD
reasoner.

The SELECT DISTINCT queries from each of the inferences rules such as
transitive relations, Balhoff rule etc are using up a lot of memory.
At present, we have deployed the reasoner on our development server
with about 100 GB of temporary space assigned. After all the
assertions (about 1.6 GB) are loaded, the reasoner in its first sweep
adds about 1.2 GB of inferences in terms of space. On the second
sweep, the reasoner is working with 2.8 GB of data, and this seems to
be where it hits the wall, because it goes more than 12 hours without
adding a single new inference. We had to terminate it at last call on
the dev server, because there were about 40 temporary files of 1 GB
each, which were being used by the reasoner and they were slowing the
other processes on the server down as well.

We may have to do an overhaul of the reasoner for better efficiency
because this is clearly an unsustainable part of the database refresh
process. Any thoughts on this?

Regards,

Cartik

References

[1] https://www.phenoscape.org/wiki/OBD_Reasoner

Chris Mungall

unread,
Jul 21, 2009, 1:51:19 PM7/21/09
to obd...@googlegroups.com

Can you send your postgresql.conf?

You can specify --verbose to see the exact rule on which it is hanging
& do a postgresql EXPLAIN. I suspect this is either the transitivity
rule or the transitive over/under is_a rule.

One option is to do a mix of forward and backward reasoning. I'll take
another look at the Balhoff rule and see if I can make more specific
suggestions (in airport on the way to ICBO just now).

I can also add something to obd-reasoner.pl to automatically break
down rules on a per-relation basis. E.g. rather than computing xRy,yRz
in one query for all x,y and transitive R, there would be n rules,
where n is the number of transitive relations.

Do you have an open postgresql 5432 port I can connect or tunnel to?

I noticed a while ago you made a change to obd-core-views.sql, adding
some phenoscape-specific stuff to the inheritable_link view
definition. This was causing reasoning to hang for me with other
datasets. I reverted this but have not committed yet. In general I
suggest we keep obd-core-views generic, and if you want to override a
view definition you do this in a phenoscape module. In this particular
case I think the additional conditions you added did not benefit you
and you could benefit speed-wise from using the more general rule.

Hilmar Lapp

unread,
Jul 22, 2009, 2:48:50 PM7/22/09
to obd...@googlegroups.com

On Jul 21, 2009, at 1:51 PM, Chris Mungall wrote:

> In general I suggest we keep obd-core-views generic, and if you want
> to override a
> view definition you do this in a phenoscape module.


I fully agree. -hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu :
===========================================================


Cartik

unread,
Jul 23, 2009, 5:04:27 PM7/23/09
to obd-dev
Hi Chris,

This turned out to be an issue with the version of PostgreSQL. The new
server was using PostgreSQL 8.4. We tested the database refresh with
Version 8.3 and it worked very nicely for us. There may be some
configuration parameters like the 'max_fsm_pages' from Version 8.3 of
PostgreSQL, which are not available in 8.4.

Thanks for your feedback.

Regards,

Cartik


Chris Mungall

unread,
Jul 24, 2009, 1:24:43 AM7/24/09
to obd...@googlegroups.com

Thanks Cartik!

As it happens I just switched to 8.4 for development and I too noticed
problems with certain queries used by the obd reasoner.

In any case I have implemented some optional switches for breaking
down some of the main queries used into smaller queries (about to
commit)
Reply all
Reply to author
Forward
0 new messages