Elk stability with number of workers

22 views
Skip to first unread message

Samuel Croset

unread,
Jan 16, 2013, 1:15:28 PM1/16/13
to elk-reasone...@googlegroups.com
Hi all,

I would like to have some input on a strange behaviour:

The setting: Elk is used to answer a series of questions one by one (subclasses of anonymous expressions via a temporary OWL class or plain named OWL class).

The problem: When the number of workers is left to default (8 with my laptop) sometimes (20% of times - random) the reasoner get "stucks": The workers are all in the wait state (see pdf attached) and nothing's happening any more (before all queries have run, seems to be in the precomputeInferences() method). However, on the same machine, when the number of working threads is limited to 4, all the queries are always well executed (in my tests).

Anyone as an explanation for that or what could cause this problem? I don't include any code in this message but it relies on Brain (which could be the origin of the problem too). Let me know if you want the original code leading to the error.

Many thanks,

Samuel
thread-bug.pdf

Yevgeny Kazakov

unread,
Jan 16, 2013, 1:35:28 PM1/16/13
to elk-reasone...@googlegroups.com
Hi Samuel,

It would be nice if we can have the stack trace of all waiting workers
when they get stuck. We can then see which monitors they are waiting
for. It could well be some deadlock. In VisualVM (guessing by your
screenshot) you can get the stack trace by pressing the "Thread Dump"
button when in the "Threads" tab.

Am I right assuming you use the latest version of ELK (0.3.1)?

Do you observe the same behavior with the latest nightly build?

http://code.google.com/p/elk-reasoner/wiki/GettingElk

Yevgeny

dosumis

unread,
Jan 16, 2013, 1:41:54 PM1/16/13
to elk-reasone...@googlegroups.com, Marta Costa
Hi all,

I may have seen related behaviour in a very different context. We run elk on a continuous integration server linked to our svn - testing consistency and making derived versions of our ontology on each commit. This usually runs fine, but elk reasoning occasionally hangs indefinitely at:

org.semanticweb.elk.reasoner.stages.ClassTaxonomyComputationStage.initComputation(ClassTaxonomyComputationStage.java:91)  - Class Taxonomy Computation using 8 workers

This is running on an 8 processor machine via OWLtools code (https://code.google.com/p/owltools/) with, AFAIK, no specification of number of workers. Classification usually takes around a second.

I have typical OWL-API log output, but doesn't look particularly informative. If you need extra info, you may need to point me to where to find it.

Cheers,

David

<thread-bug.pdf>

David Osumi-Sutherland

unread,
Jan 16, 2013, 1:44:53 PM1/16/13
to elk-reasone...@googlegroups.com, Marta Costa
Hi all,

I may have seen related behaviour in a very different context. We run elk on a continuous integration server linked to our svn - testing consistency and making derived versions of our ontology on each commit. This usually runs fine, but elk reasoning occasionally hangs indefinitely at:

org.semanticweb.elk.reasoner.stages.ClassTaxonomyComputationStage.initComputation(ClassTaxonomyComputationStage.java:91)  - Class Taxonomy Computation using 8 workers

This is running on an 8 processor machine via OWLtools code (https://code.google.com/p/owltools/) with, AFAIK, no specification of number of workers. Classification usually takes around a second.

I have typical OWL-API log output, but doesn't look particularly informative. If you need extra info, you may need to point me to where to find it.

Cheers,

David

On 16 Jan 2013, at 18:15, Samuel Croset wrote:

<thread-bug.pdf>

Samuel Croset

unread,
Jan 16, 2013, 1:59:08 PM1/16/13
to elk-reasone...@googlegroups.com
Hi Yevgeny,

It would be nice if we can have the stack trace of all waiting workers
when they get stuck. We can then see which monitors they are waiting
for. It could well be some deadlock. In VisualVM (guessing by your
screenshot) you can get the stack trace by pressing the "Thread Dump"
button when in the "Threads" tab.

I'm indeed using the VisualVM. The thread dump while bugging is here attached.
 

Am I right assuming you use the latest version of ELK (0.3.1)?

Yes, from Maven Central.
 

Do you observe the same behavior with the latest nightly build?

http://code.google.com/p/elk-reasoner/wiki/GettingElk

I will try.

Cheers,

Samuel
threaddump.txt

Chris Mungall

unread,
Jan 16, 2013, 2:06:09 PM1/16/13
to elk-reasone...@googlegroups.com, Marta Costa

We've run into the same.

We configure our CI server so that there's never more than one Elk job running on one VM which I think helps.

Calling reasoner.dispose() helps (David, owltools should do this automatically)

No more detailed info, sorry

Yevgeny Kazakov

unread,
Jan 16, 2013, 2:35:52 PM1/16/13
to elk-reasone...@googlegroups.com
Hi Samuel,

Thanks for you input! This looks like a bug I have seen (and tried to
fix) before.
If you can, could you please check if it shows up in the latest nightly build?
If it is still there, I will need to reproduce it somehow. Will you be
able to send me your ontology and the program that you try to run?

Yevgeny

Yevgeny Kazakov

unread,
Jan 17, 2013, 12:04:56 PM1/17/13
to elk-reasone...@googlegroups.com
Hi,

just to let you know, I have fixed some issue that could have caused
the problem reported by Samuel. The fix is now released in the version
0.3.2 which should also be available in Maven Central.

Samuel, I would be grateful if you could check whether the new version
is causing the same problem.

Best regards,

Yevgeny

Samuel Croset

unread,
Jan 17, 2013, 12:07:12 PM1/17/13
to elk-reasone...@googlegroups.com
Hi Yevgeny,

Excellent! Many thanks. I was working on the bug report. I check and let you know!

Cheers,

Samuel

Samuel Croset

unread,
Jan 18, 2013, 5:42:43 AM1/18/13
to elk-reasone...@googlegroups.com
Hi Yevgeny,

I don't notice the problem any more when using Elk 0.3.2 + OWL-API 3.4.2. My test includes a continuous series of 1000 requests with reclassification triggered each time in between the queries (anonymous expressions). This setting was provoking the bug pretty much every time before.

Shall we except theoretical changes in performance? So far I didn't notice anything suspicious out of the test.

Thanks for fixing it quickly,

Samuel

Yevgeny Kazakov

unread,
Jan 18, 2013, 6:00:18 AM1/18/13
to elk-reasone...@googlegroups.com
On Fri, Jan 18, 2013 at 11:42 AM, Samuel Croset <samuel...@gmail.com> wrote:
> Hi Yevgeny,
>
> I don't notice the problem any more when using Elk 0.3.2 + OWL-API 3.4.2. My
> test includes a continuous series of 1000 requests with reclassification
> triggered each time in between the queries (anonymous expressions). This
> setting was provoking the bug pretty much every time before.

Great news, Samuel! Thanks a lot for testing!
Hopefully the issue is now being resolved.

> Shall we except theoretical changes in performance? So far I didn't notice
> anything suspicious out of the test.

No, there should not be any other changes. I just patched this
particular bug and (given its severity) decided to release a backport.
We are working on the next major version 0.4.0 which should include
more changes. It may be released in a month or so.

Cheers,

Yevgeny

Samuel Croset

unread,
Jan 18, 2013, 6:24:02 AM1/18/13
to elk-reasone...@googlegroups.com
We are working on the next major version 0.4.0 which should include
more changes. It may be released in a month or so.

Nice! Any spoilers regarding the new features?

Samuel 

Yevgeny Kazakov

unread,
Jan 18, 2013, 9:22:33 AM1/18/13
to elk-reasone...@googlegroups.com
There will be a new inference engine and performance improvement for
frequent re-classifications with not many changes (like in your
scenario with lot's of queries). There will probably be not many new
OWL features added, maybe something like DifferentIndividuals
constructor.

Yevgeny

>
> Samuel
Reply all
Reply to author
Forward
0 new messages