Hi,
I'm glad to announce the Phenoscape DB has hit the 3.5 GB mark, which
takes it clearly out of the realm of toy databases. However, we are
having serious issues with running the OBD reasoner, actually the OBD
reasoner with our additions to it. Please refer to [1] for a summary
of the inferences, which includes those from the original OBD
reasoner.
The SELECT DISTINCT queries from each of the inferences rules such as
transitive relations, Balhoff rule etc are using up a lot of memory.
At present, we have deployed the reasoner on our development server
with about 100 GB of temporary space assigned. After all the
assertions (about 1.6 GB) are loaded, the reasoner in its first sweep
adds about 1.2 GB of inferences in terms of space. On the second
sweep, the reasoner is working with 2.8 GB of data, and this seems to
be where it hits the wall, because it goes more than 12 hours without
adding a single new inference. We had to terminate it at last call on
the dev server, because there were about 40 temporary files of 1 GB
each, which were being used by the reasoner and they were slowing the
other processes on the server down as well.
We may have to do an overhaul of the reasoner for better efficiency
because this is clearly an unsustainable part of the database refresh
process. Any thoughts on this?
Regards,
Cartik
References
[1]
https://www.phenoscape.org/wiki/OBD_Reasoner