Re: [opencog/cogutil] Add ThreadLocal utility class (#108)

47 views
Skip to first unread message

Linas Vepstas

unread,
Apr 25, 2018, 2:21:41 PM4/25/18
to Vitaly Bogdanov, opencog, Nil Geisweiller

OK, first we need to settle some organizational issues.

Writing private emails to me and Nil is just counter-productive. The conversation needs to be carried out either on the mailing list, or in github, or both.  The reason for using a mailing list are these:

-- other people can become aware of the conversation, and can participate in it.
-- mailing list archives provide a permanent record of what was said, and how the decisions got made.  This is often very useful, as searching for mailing list archives with google is a lot easier and more effective than searching my email box.  Emailing list archives are a prosthetic extension of my brain - they record my long-term memories.  I have tools for retrieving those long-term memories that work well with the mailing lists, and do not work at all with private email archives.

Github issues provide an equally good, if not better way of recording discussions, since they group related discussions into one location. Its a far superior technology to slack channels, and other chat-based nonsense.

I just opened issue https://github.com/opencog/atomspace/issues/1645 to track this work. This is where future discussion should happen.

I am also concerned about the priority of the work. https://github.com/opencog/atomspace/labels/pattern-matcher lists twenty issues that exist in the current pattern matcher. Its not at all obvious that parallelizing the pattern matcher is all that important or urgent.  I am guessing that Ben put you up to this.  Some advance planning and organization would have been maybe a bit better.

On Wed, Apr 25, 2018 at 12:07 PM, Vitaly Bogdanov <vsb...@gmail.com> wrote:

Two main objects participate in matching: PatternMatchEngine instance and PatternMatchCallback instance.

PatternMatchEngine actively modifies its state during matching so it cannot be used from more than one thread. But we can make one copy for each thread and use copies safely, as PME doesn't reuse its state between different calls of explore_neighborhood() method (1).

Making one copy per thread seems reasonable to me. It would be the "normal" way of doing multi-threading.

PatternMatchCallback is different thing.

Why not make one copy per thread, also? So:
 
- (2) grounding() method modifies the state (saves matching results)

OK, so this is maybe the hardest part.  The class would need to be split into two parts: one, which would operate as one-instance-per-thread, and another part that operates as one-instance-per-search.

However, one work item is to alter the pattern matcher to report results as they are found, rather than waiting to complete, before reporting anything. See https://github.com/opencog/atomspace/isues/1507  If results are reported as they are found, then the need for a class that operates at a one-instance-per-search seems to go away, or, at least, should be handled differently.
 

- (3) evaluate_sentence() method in DefaultPatternMatchCB uses temporary atomspace to evaluate things and cleans up atomspace after evaluation so it should be synchronized

The temporary atomspaces currently do not hold information between each exploration. Thus, we can alloc one temp atomspace per thread.
 
- (4) scope_match()/link_match()/post_link_match()/post_link_mismatch() methods in DefaultPatternMatchCB share state between calls via pair of pointers: _pat_bound_vars and _gnd_bound_vars - in original code
- push()/pop() methods are intended to modify the state but there are no nontrivial implementations

Yes but no. That state is not shared outside of each exploration.  You really need to visualize each exploration as a stack -- there is a huge amount of info cached on the C stack, literally -- this is a very recursive search; and what is not on the C stack is explicitly controlled with the push/pop stack. It would be a major mistake to try to parallelize the stack; you'll never get it to work right.  The parallelization MUST happen in the initiate-search loops, and not somewhere else.  


(1) - to not make unnecessary copies of PatternMatchEngine on each iteration -

What's wrong with making copies? This seems cheap, easy, direct and simple to me.  Simpler and safer than trying to put mutexes everywhere.
 


(3) - I used thread local here to keep implementation as is. At the moment temporary atomspace is created once and reused each time when it is required to evaluate the term. grab_transient_atomspace() method gets it from cache using mutex to be thread safe, so it is not clear why it is kept in DefaultPatternMatchCB state. I agree that it can be removed from state and received each time when it is needed from cache. Performance have to be measured to understand if there is significant loss.

I would rather design this correctly, than try various hacks and measure their performance. The correct design would seem to be a temp atomspace per thread.

FWIW, if you need a resource pool, we already have one: https://github.com/opencog/cogutil/blob/master/opencog/util/pool.h
 

(4) - this is most complex case because there is no simple way using stack: link_match() sets _pat_bound_vars and _gnd_bound_vars and scope_match()/post_link_match()/post_link_mismatch() methods use it. Straightforward solution I see is to move _pat_bound_vars and _gnd_bound_vars initialization code out of DefaultPatternMatchCB into PatternMatchEngine and pass them to scope_match but it requires callback signature changing.

The straightforward solution is to not try to parallelize the stack at all. Instead, do all parallelization in the initiate-search loops.

Again; I doubt that you'll ever be able to fully implement and debug the parallel stack machine.  I mean, sure, maybe you can, but it seems like a waste of time and intellectual effort, and tehre is a high probability that it will be a performance looser.  Just don't even try to parallelize there.

(I hope its clear, what I am trying to talk about here).

----------
OK, next: procedurally: it will be a whole lot easier if you understake this projects with a whole bunch of smaller pull requests, each of which reorganizes teh code in such a way that parallelizing the initiate-search loops becomes easy.

I am very deeply concerned that you will create some huge, complex block of code changes, that you will submit it in some giant, massive pull request, you will insist that its just great and wonderful, and then I will get into a giant and unpleasant argument with Ben as to whether it should be merged or not.  I really really want to avoid this kind of an ugly situation before we get even close to it.   The way to do this is really by making and testing small, incremental stages that get you to where you want to go, and then creating pull requests for each small, incremental change.  Right now, this, for me, is an extremely important way to operate.

-- Linas

 

Thanks,
Vitaly

2018-04-25 18:42 GMT+03:00 Linas Vepštas <notifi...@github.com>:

OK, now some comments re thread-local.

So, one standard idiom for storage that gets automatically cleaned up is to put that storage on the stack. When the stack unwinds, the storage is cleaned up automatically. I don't see why this is not sufficient in this case.

Separately, I kind of would have liked to talk to you about parallelizing the pattern matcher before you started writing code, instead of after. The "natural", simplest, easiest place to parallelize would be to make the initiate-search loops parallel, and then allow each search to run in a distinct thread. This, it would seem to me, to be a natural place to set up the per-thread stack: its located "at the root" of the search, and so is guaranteed to be alive, until that thread finishes.

Doing this would seem to obviate the need for some per-thread storage that shows up "from thin air", in the middle of some random code doing some random thing. I'm rather nervous about the debuggability and correctness of what you are proposing to do.

Anyway, this pull req is the wrong place to have this discussion.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.





--
cassette tapes - analog TV - film cameras - you

Nil Geisweiller

unread,
Apr 26, 2018, 12:55:26 AM4/26/18
to linasv...@gmail.com, Vitaly Bogdanov, opencog, Nil Geisweiller
On 04/25/2018 09:21 PM, Linas Vepstas wrote:
> Writing private emails to me and Nil is just counter-productive. The
> conversation needs to be carried out either on the mailing list, or in
> github, or both.

I totally agree. I did recommend to make discussions public, though
perhaps I did not insist enough (using formulas like "do not hesitate"
instead of "you really should").

> I am also concerned about the priority of the work.
> https://github.com/opencog/atomspace/labels/pattern-matcher
> <https://github.com/opencog/atomspace/labels/pattern-matcher> lists
> twenty issues that exist in the current pattern matcher. Its not at all
> obvious that parallelizing the pattern matcher is all that important or
> urgent.  I am guessing that Ben put you up to this.  Some advance

I was asked some motivational task to get Vitaly started and this popped
up in my head. Vitaly has a strong background on System/Network
programming so I thought it was a good match.

In my opinion it's OK if it's not the highest priority (though I think
it's really cool to have) since it is getting Vitaly familiar with some
very important part of OpenCog.

> planning and organization would have been maybe a bit better.

Yep, I should have been more aggressive on the public thing. I tend not
to like to push people too much, it makes me feel I'm coercing them, but
I think I was out of balance on that one.

> to parallelize the stack; you'll never get it to work right.  The
> parallelization MUST happen in the initiate-search loops, and not
> somewhere else.

Just to be clear, that is precisely what Vitaly is doing, parallelizing
the initiate-search loops.


Also, totally agree on splitting large PRs into small chunks of cohesive
PRs.

Vitaly, that would mean for instance that this commit

https://github.com/vsbogd/atomspace/commit/6c2ab67d026116cebf6d2b388f022ef898e854f6

of your branch

https://github.com/vsbogd/atomspace/commits/parallel-pattern-matcher

should go into another branch, and could probably be pushed ASAP.

Nil

Vitaly Bogdanov

unread,
Apr 27, 2018, 10:27:19 AM4/27/18
to Nil Geisweiller, Linas Vepstas, opencog
I have added comment at https://github.com/opencog/atomspace/issues/1645 as suggested by Linas

Organizational points are clear.
Reply all
Reply to author
Forward
0 new messages