NestLoopIndexExecutor COnversion

14 views
Skip to first unread message

mike alexeev

unread,
Jun 5, 2012, 11:54:18 AM6/5/12
to voltd...@googlegroups.com
Hi Paul, I am almost done with the nestloopindexexecutor conversion but
there are couple of design decisions I would like to clarify. Both of them
are related to the fact that index scan node is currently inlined within the
nestloopindex node and won't be initialized via the 'normal'
pre_execute_pull initialization. To properly initialize index scan before
the execute call we can
1. Make it not inlined but a regular child of the nestloopindex executor
similar to the nestloop one. This is probably the right way to go in the
long run but it will brake current push process
2. Keep it inlined for now until all executors are converted. It allows for
both approaches to co-exists. The only draw back is that indexscan node
needs to be explicitly initialized from within
nestloopindexexecutor::p_pre_execute_pull call. As soon as all executors
support the pull this initialization needs to be removed.
3. A combination of the above two - keep it as inline node and at the same
time add it as a child. The push process continues using the former and the
pull uses the latter.

So far, I settled for the second one.

The second question is about resetting inner node executor state while
iterating over the outer node. Unfortunately, it's not that straightforward
as in case of the nestloop. The main differences is that it must take a
parameters - outer tuple. The question is how to pass it to the executor. I
can add one more pair of public/protected reset_state_pull methods which
take tuple as an input parameter in addition to already existing
parameter-less pair, but it doesn't look pretty. The other option is to call
IndexScanExecutor directly from the loop and reset its state via a new
specific method. The obvious draw back is that we use two different
approaches for resetting the state for nestloops.

Thanks,
Mike

Paul Martel

unread,
Jun 6, 2012, 10:01:43 PM6/6/12
to voltd...@googlegroups.com
Hi Mike.

On Tue, Jun 5, 2012 at 11:54 AM, mike alexeev <michael...@gmail.com> wrote:
Hi Paul, I am almost done with the nestloopindexexecutor conversion but there are couple of design decisions I would like to clarify. Both of them are related to the fact that index scan node is currently inlined within the nestloopindex node and won't be initialized via the 'normal' pre_execute_pull initialization. To properly initialize index scan before the execute call we can
1. Make it not inlined but a regular child of the nestloopindex executor similar to the nestloop one. This is probably the right way to go in the long run but it will brake current push process
2. Keep it inlined for now until all executors are converted. It allows for both approaches to co-exists. The only draw back is that indexscan node needs to be explicitly initialized from within nestloopindexexecutor::p_pre_execute_pull call. As soon as all executors support the pull this initialization needs to be removed.
3. A combination of the above two - keep it as inline node and at the same time add it as a child. The push process continues using the former and the pull uses the latter.

So far, I settled for the second one.
Yes. This sounds good for the short-to-medium term, though I'm not following you about how later "this initialization needs to be removed" -- what would replace it?
The second question is about resetting inner node executor state while iterating over the outer node. Unfortunately, it's not that straightforward as in case of the nestloop. The main differences is that it must take a parameters - outer tuple.  The question is how to pass it to the executor. I can add one more pair of public/protected reset_state_pull methods which take tuple as an input parameter in addition to already existing parameter-less pair, but it doesn't look pretty. The other option is to call IndexScanExecutor directly from the loop and reset its state via a new specific method. The obvious draw back is that we use two different approaches for resetting the state for nestloops.
Postgres has a model for this that is based on a generalization of parameters. In VoltDB, parameters are assumed constant for the life of a query and so are dealt with once in the p_init method. In Postgres, parameters can also be things like specific values from an outer tuple that are used to init/reset inner table executor nodes (like, in our case, the start/end keys/criteria of the inner index scan) for shorter time periods than the query lifetime. This idea of dynamically reset parameters has other applications (like in correlated sub-queries) as the complexity of supported queries grows. So, I don't think that it's completely over-the-top to consider an AbstractExecutorState API for reinit with changed "parameters". It's true that an outer tuple does not look all that much like the vector of parameters we pass to p_init, but there are definite similarities in how they are used. Maybe it even makes sense for query parameters to be packed into a dummy pre-set "outer tuple" and for expression eval to extract them as "columns", so we get some greater consistency. I think that it SEEMS uglier now than it actually some day will be to have this generalized reset/reinit functionality, due to its currently limited applicability to this one case. Even just considering different kinds of nest loops, I can imagine a future where correct "outer join" support forces us to allow joins to nest both on their outer sides (as now) AND on their inner sides so that you could even be joining two joins, not just a join and a scan. We might have to propagate outer tuple values into inner child joins in that case.

That's all intended as food for thought. Keeping this for now as inline code specific to an inlined indexscan node is also OK.

Also, it would be a little nicer if we could undo a little of the cut&paste job from indexscanexecutor and find some way to refactor out the shared code into indexscanexecutor methods for setting up and executing each scan "pass". Maybe that's a little off-topic for this project and better to do as a follow-on? An idea even further along these lines would be to take (at least some) join predicate processing out of nestloopexecutor and push it down to the scan executor (or maybe a new generic predicate executor that would sit just above the scan). The idea would be to use the same API that nestloopindexexecutor uses to drive indexscanexecutor so that nestloopexecutor and nestloopindexexecutor could be unified into a single class that just happens to have different children.
 

Thanks,
Mike

In other news, I've gotten some time (and qa help) scheduled in the new development cycle to do some benchmarking on this code.
I really didn't get that far with it in the last cycle.
I've also gotten a few questions from the team about how the project is proceeding, whether it's keeping your interest, and whether you'd be interested in a side-project or prefer to stay focused on this. They're really excited about bringing this in-house for some testing.  Let me know your thoughts.

--paul

Michael Alexeev

unread,
Jun 6, 2012, 11:22:20 PM6/6/12
to voltd...@googlegroups.com
Hi Paul,
I meant the explicit IndexScanExecutor::pre_execute_pull call from the
NestLoopIndexExecutor::p_pre_execute_pull. In the long run the
indexscan node doesn't need to be inlined and will be initialized via
regular chain of the Pre_execute_pull calls.
I was actually thinking about NValueArray object as a generic way to
path parameters similar to the p_init method. The problem I ran into
was that you actually need to pass two tuples to reset the inner state
- the search key and the outer tuple. How would you know which values
are for the first one and which ones belong to the second tuple? Well,
in this particular case, the search key reset logic could be moved
from nestloop to indexscan, but in general the question remains.
>
> Also, it would be a little nicer if we could undo a little of the cut&paste
> job from indexscanexecutor and find some way to refactor out the shared code
> into indexscanexecutor methods for setting up and executing each scan
> "pass". Maybe that's a little off-topic for this project and better to do as
> a follow-on? An idea even further along these lines would be to take (at
> least some) join predicate processing out of nestloopexecutor and push it
> down to the scan executor (or maybe a new generic predicate executor that
> would sit just above the scan). The idea would be to use the same API that
> nestloopindexexecutor uses to drive indexscanexecutor so that
> nestloopexecutor and nestloopindexexecutor could be unified into a single
> class that just happens to have different children.

I did some refactoring but not to the full extent. I will post my
latest code shortly so you could take a look and provide me feedback.
It seems to work in some simple cases but I see crashes running the
test suite. Something is not initialized properly. So, please treat it
as a work-in-progress.

>
>>
>>
>> Thanks,
>> Mike
>
>
> In other news, I've gotten some time (and qa help) scheduled in the new
> development cycle to do some benchmarking on this code.
> I really didn't get that far with it in the last cycle.
> I've also gotten a few questions from the team about how the project is
> proceeding, whether it's keeping your interest, and whether you'd be
> interested in a side-project or prefer to stay focused on this. They're
> really excited about bringing this in-house for some testing.  Let me know
> your thoughts.
That's a great news! It would be really nice to do some real life
testing and compare the results with the current code.
As far as the project goes, I definitely want to finish this one
before moving on on something else. At the beginning I was a little
bit scared by the magnitude of changes and wasn't sure whether I would
be able to pull it off or not, but now I can see the light at the end
of the tunnel, so let keep going!

Mike
>
> --paul
>
Reply all
Reply to author
Forward
0 new messages