On Mon, Apr 9, 2012 at 9:50 PM, Michael Alexeev
<michael...@gmail.com> wrote:
Agree. Either we need to
preserve the state between the calls or pass it as additional parameters
or both. At the moment I don't feel what is the right solution but hope
to get a better idea after some prototyping.
I'm leaning towards a minimum of parameter passing, but I expect that prototyping will make this clear.
Yes, it's the simplest example.
For example, execute_one signature should be updated to operate on
multiple input nodes to be able to handle joins. Are you saying that it
is also possible to have multiple output tuples as well? My plan was to
start with something simple (like seqscan) and than increase the
complexity of the executors for example, distinct, nested join).
I didn't mean to imply that a node had multiple output tuples in the sense of supporting multiple "output" iterators -- I have had some thoughts and discussions of such an architecture, but was fortunate never to have had to implement it. My meaning was just that the number of iterations of the output iterator generally has a many-to-many relationship with the number of iterations of the underlying input iterator(s). On the other hand, join executors (and I think I MIGHT one day argue aggregate executors, as well) DO operate on multiple input iterators, but their relationship to their input iterators tends not to be very symmetric. As I see it, even a join node has one "main" input iterator just like other executor nodes. As an "implementation detail", it internally uses another iterator in its specialized processing (deciding just like any other executor if/how results from its "main" input iterator translate into one or more immediate or later results to its output. But it tends to use this other iterator very differently from its main one, requiring a broader API not unlike the API to the top level executor or (some day?) to the top level executor of a subquery.
My
immediate reaction is to chain existing p_init call to initialize
executors and add new method p_cleanup to do tear down, but there may be
a better solution.
This is the right general idea. There are some subtleties here around keeping the nodes loosely coupled, deciding when to use iteration and when recursion, and when to use generic programming, and when to use object specialization, but I think that will work itself out as we go forward.
I want to be a little forward-looking here and attempt to anticipate
support for at least some simple forms of sub-query and possibly some
fancier joins. In my experience, this requires another entry point
which allows an execution loop to be restarted mid-query possibly with
different "input parameters".
Could please you provide some examples?
This relates to the broader API I was referring to for a join's second "inner loop" input iterator. One of the jobs of the current p_execute function is the "binding of parameters" which means that it must (potentially) substitute actual contant values for any place-holders that might exist in its internal expressions. In the normal case, these place-holders represent the "?" parameters in the original query statement that get bound to actual arguments just before query execution. For example, such "?" parameters might represent the start and end key values in an index scan. Obviously, these substitutions only need to be processed once -- let's say that will happen in p_init (even though that's not where it happens now) -- and not repeated every time we fetch a new record from the index.
Now consider the case of a NestLoopIndexScan. In this case, the range of keys (often an exact key value) is typically determined not by the value specified for a "?" parameter but rather by each value (in turn) from some matching column pulled from the main "outer loop" input iterator. Note the similarities and differences with a normal IndexScan based on a "?" parameter.
-- similarity - the parameter must be "bound" before index scan begins.
-- similarity - the parameter remains in effect over potentially several iterations of matching tuples from the index scan
-- similarity - the parameter remains in effect until the index scan is complete -- no more tuples match
-- difference - when no more tuples match, we may not be finished yet. If there is another row, the index scan needs to be restarted but now with a different matching column value.
-- similarity - that's ALMOST just like finishing and restarting a stand-alone index scan query with a potentially different argument value passed in for the "?" parameter
-- difference - but it's NOT QUITE so involved. Most of the setup that happens in p_init (and the breakdown in p_cleanup) does not really need to be repeated, just the final steps of (re)binding the parameter and (re)starting the index iteration.
In practical terms, I am talking about simplifying join executor code by reducing redundant code between NestLoop scans and normal parameterized stand-alone scans. I am also paving the way for "correlated subquery scans" -- in which there really is the potential for a full-blown query executor tree in the middle of some other scan that needs to be re-run on each "outer" iteration with different parameter bindings. The beauty of the approach is that it practically comes for free simply from splitting later "init" steps and earlier "cleanup" steps into their own functions so they can alternatively be called back-to-back as a kind of "rebind/reset" step.
Sorry, I meant to push projection node all the way down to be a leaf in the access path instead of the root.
This way, it will be the first executor to evaluate the next tuple from the target table.
Yes, this can sometimes be a good thing, but I think that this may want to be decided on more of a case-by-case basis. In the simplest case, projection eliminates columns, narrowing the rows, but until we actually have to transmit those rows, there may not be any great benefit. Also consider that projection can sometimes complicate and even widen the representation of intermediate tuples. It may be lighter/faster to retain a series of tuple addresses and a constant "thunk" that represents "1st and 3rd columns and the sum and difference of columns 9 and 10" for evaluation later on demand rather than copying/storing the result of that processing up front.
I would gladly start
prototyping. I will definitely need all support I can get and more :) To
begin with, I am not sure whether I have right privileges to create a
branch. Also what would be a representative set of nodes/executors to
begin with?
I'm not sure whether branching can be part of a pull request. But no problem, if you want to get started on master, I'll set up a branch tomorrow. I think that once a branch is established it should be easy enough to pull to them.
I've got kind of a radical idea of how we could support an ever-expanding prototype on the branch without breaking the universe.
My idea is basically that we start by layering the new function protocol ON TOP OF the existing one, being careful to use new function names on the existing classes AND provide a fallback implementation for all nodes by means of shim methods on AbstractExecutor. That is, the top-level code is changed to follow whatever row-level pull protocol we want BUT rather than a pure virtual implementation on AbstractExecutor, we provide, for now, a fully functional worst-case implementation leveraging the existing implementation and ignoring the new implementation (for now, even on other executors) as much as possible.
Something like:
tuple p_next() {
if ( ! m_forNow_outputTableIteratorIsInitialized) {
// relies on some forNow method identifying not-yet-inited children of each node,
// possibly based on some (forNow?) data member(s) on AbstractExecutor
p_forNow_p_init_and_execute_my_children_then_me(...);
m_forNow_iterator = getIteratorOver(m_outputTable);
}
return m_forNow_iterator.next();
}
Without any further specialization, that should give a working system that satisfies the new pull protocol (if only in fact, not in spirit) and operates only slightly worse than the current one.
From there, things can only get better as you refine p_next and the rest of the new protocol functions for whatever executors you choose, looking to broaden the set of use cases re-implemented executor by executor from the top down that don't call the old p_init and p_execute or refer to the "forNow" members.
Maybe you'd want to try to get a single-partition seqscan working (without breaking any existing methods) to establish the protocol baseline before setting up this forNow scaffolding and generic backfill implementation on AbstractExecutor.
I'm not sure what makes sense for an executor by executor plan of attack, but I like the idea of being able to run regression suites as the reimplementation evolves.
--paul