Top-down executor approach

29 views
Skip to first unread message

Michael Alexeev

unread,
Mar 29, 2012, 9:11:16 PM3/29/12
to voltdb-dev
Hi All,

Thanks everyone for the very valuable input! It seems that the consensus is that the top-down (or pull) solution to drive data through the executor's chain is preferable to the inline node approach . It eliminates the need for the multiple intermediate input/output tables for almost all nodes in the graph with the exception of input table(s) for leaf(s) and an output table for the root. The existing inline nodes also can be eliminated since their primary purpose is to bypass intermediate in/out tables which will be gone anyway. Is it more or less accurate?

Below is my attempt to refactor the existing logic based on the Paul's and Ariel's suggestions:

1. factor out the behavior of the (per input row) loop body in Executor::p_execute -- call it Executor::exec_one
   Easy for some executors, not so for others - nested loop ones, for example

2. Modify p_execute method for each executor to do depth first traversal to iterate over the input table(s) and apply node specific logic to each tuple on its way up to the root. I called it p_next
   Bool Executor::p_next( const NValueArray &params, TableTuple &out)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         // @TODO Somehow need to preserve the input_iterator state between the calls
         hasNext = input_iterator.next(out);
    }
    else
    {
        // Recurs to the children. The actual code could vary from executor to executor
        // but the executor specific behavior could be factored out via virtual or templated call.
        // The default one could be as simple as

        hasNext = children[0]->getExecutor()->p_next(out, params);
    }

    // apply node specific logic to process the tuple.
    if (hasNext)
          out = executor->execute_one(out, params);
   return hasNext;
}

3. The Executor::p_execute becomes
  Executor::p_execute(const NValueArray &params)
{
    // Do some initial stuff

    TableTuple tuple;
    while(this->p_next(tuple, params))
    {
        output_table->insertTuple(tuple);
    }
    // clean-up if necessary
}

4. Modify int VoltDBEngine::executeQuery:  replace loop over the executor stack with a single execute call to a top executor

5. Modify Planner not to inline Projection Node within a parent node (or any other node which currently gets inlined) but make it a child of that node. That way tuple will be materialized at the very begining of the executor's chain.

One nagging question is how to preserve the state of the iterator over the imput table at the leaf node between the p_next calls. One possible solution is to make the iterator a data member of the corresponding node but it makes the node stateful which doesn't sound right. The other possibility is to add one more parameter to the p_next call (map of node pointer, input iterator pairs) and pass it up and down the call stack. At the leaf level executor would look for iterator in the map first and if not there (first time call) insert it first. Something like

   Bool Executor::p_next(const NValueArray &params, TableTuple &out, std::map<AbstractPlanNode*,TableIterator>& iterators)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         if (iterators.find(node) == iterators.end())
             iterators.insert(std::make_pair(node, node->getTargetTable()->iterator());
         TableIterator iterator = iterators[node];
         hasNext = iterator.next(out);
    }
// the rest is same
....
}

Of cause, this is a sudo code only and more details need to be figured out. But do you think this is the step in the right direction and it's worth moving forward? Comments, suggestions are very welcome.

Mike 


Paul Martel

unread,
Apr 3, 2012, 9:14:53 PM4/3/12
to voltd...@googlegroups.com
HI, Mike.

On Thu, Mar 29, 2012 at 9:11 PM, Michael Alexeev <michael...@gmail.com> wrote:
Hi All,

Thanks everyone for the very valuable input! It seems that the consensus is that the top-down (or pull) solution to drive data through the executor's chain is preferable to the inline node approach . It eliminates the need for the multiple intermediate input/output tables for almost all nodes in the graph with the exception of input table(s) for leaf(s) and an output table for the root. The existing inline nodes also can be eliminated since their primary purpose is to bypass intermediate in/out tables which will be gone anyway. Is it more or less accurate?

Below is my attempt to refactor the existing logic based on the Paul's and Ariel's suggestions:

1. factor out the behavior of the (per input row) loop body in Executor::p_execute -- call it Executor::exec_one
   Easy for some executors, not so for others - nested loop ones, for example

Yes. The degree of difficulty is closely related to how much state p_execute is currently keeping in the call stack when it is ready to "insert the next row". In the simplest pull model "insert the next row" translates to something like "return the row, BUT remember how to get back here to find the next one". So lots of local variables need to be preserved as Executor data members. More complex Executors like joins, especially outer joins, that operate in phases will operate like finite state machines.


2. Modify p_execute method for each executor to do depth first traversal to iterate over the input table(s) and apply node specific logic to each tuple on its way up to the root. I called it p_next
   Bool Executor::p_next( const NValueArray &params, TableTuple &out)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         // @TODO Somehow need to preserve the input_iterator state between the calls
         hasNext = input_iterator.next(out);
    }
    else
    {
        // Recurs to the children. The actual code could vary from executor to executor
        // but the executor specific behavior could be factored out via virtual or templated call.
        // The default one could be as simple as

        hasNext = children[0]->getExecutor()->p_next(out, params);
    }

    // apply node specific logic to process the tuple.
    if (hasNext)
          out = executor->execute_one(out, params);
   return hasNext;
}

This is a good start, but unfortunately a little too simple for SOME nodes -- distincts, some joins, early stages of orderby or aggregates, will need to pull several tuples from the child with (possibly) no result (yet), while other nodes (later stages of orderby, aggregates, some joins, will be able to produce a multitude of tuples after one or zero consultations to the child.

3. The Executor::p_execute becomes
  Executor::p_execute(const NValueArray &params)
{
    // Do some initial stuff

    TableTuple tuple;
    while(this->p_next(tuple, params))
    {
        output_table->insertTuple(tuple);
    }
    // clean-up if necessary
}

I think that this is slightly blurring a few different functions that need to be separated in a pull system.
There is the "execute" call that, as you point out, can get called once at the top of the stack.
But if p_execute is supposed to be implementing that call, then where do other nodes than the top one get to "// Do some initial stuff " and later "clean up if necessary"? It seems like that code should be factored out and called on each node prior to launching the executor loop at the top. 

I want to be a little forward-looking here and attempt to anticipate support for at least some simple forms of sub-query and possibly some fancier joins.  In my experience, this requires another entry point which allows an execution loop to be restarted mid-query possibly with different "input parameters".

4. Modify int VoltDBEngine::executeQuery:  replace loop over the executor stack with a single execute call to a top executor

5. Modify Planner not to inline Projection Node within a parent node (or any other node which currently gets inlined) but make it a child of that node. That way tuple will be materialized at the very begining of the executor's chain.
The projection nodes currently start off as parents of scans, etc. that get inlined into the scans. Are you saying "keep the projection nodes there as parents" or are you saying something else I don't understand?  I'm not sure what "materialized" means in this context -- by my definition, constructing a mutated copy of a table, I wouldn't say that materialization is a natural result of projection.


One nagging question is how to preserve the state of the iterator over the imput table at the leaf node between the p_next calls. One possible solution is to make the iterator a data member of the corresponding node but it makes the node stateful which doesn't sound right. The other possibility is to add one more parameter to the p_next call (map of node pointer, input iterator pairs) and pass it up and down the call stack. At the leaf level executor would look for iterator in the map first and if not there (first time call) insert it first. Something like

   Bool Executor::p_next(const NValueArray &params, TableTuple &out, std::map<AbstractPlanNode*,TableIterator>& iterators)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         if (iterators.find(node) == iterators.end())
             iterators.insert(std::make_pair(node, node->getTargetTable()->iterator());
         TableIterator iterator = iterators[node];
         hasNext = iterator.next(out);
    }
// the rest is same
....
}


In a way, Executors have always been stateful, it's just that the (temp table) format that they use to hold their state is not as dynamic or efficient as we'd like. The dynamic state of the Executors is one of the main things that distinguishes them from their corresponding PlanNodes. Your pseudo-code seems to be confusing these two abstractions. I'm not seeing the downside of stateful executors. They are single-threaded and dedicated to a particular query execution. I could envision the same PlanNode trees being recycled from one query execution to another -- though I suspect (I could be wrong) that we currently always re-generate even these on-the-fly per query executon from their JSON descriptions -- but I don't see the point in also recycling executor trees or keeping their associated dynamic states separated out. It would be better to extend the Executors to contain whatever dynamic state they need than to invent yet another hierarchy of classes to track Executor-specific dynamic state. I would rather that Executors BE abstract iterators than HAVE iterators.

Of cause, this is a sudo code only and more details need to be figured out. But do you think this is the step in the right direction and it's worth moving forward? Comments, suggestions are very welcome.

I think that this basic approach is sound and worth pursuing, perhaps initially prototyping it on a git branch? There are clearly a number of details that have to be worked out, especially for a completely general solution covering all types of plannodes. I personally find this an interesting project with lots of potential benefits to the product. As I come up to speed here at VoltDB, I'm hoping that I can strike a balance between supporting you in this effort, helping with a few of the identified high priority tasks in the team's backlog, and possibly branching out in some new directions of my own.

BTW, I actually lost track for a bit of the fact that your original non-inlined DISTINCT push-down has not been integrated. Do you have a clean patch with just that fix that might be ready for integration? There's some interest in getting that into the next iteration of development.

WRT the inlined distinct, I can see both sides of the argument for trying to integrate it vs. leaving it out in favor of the more general approach of switching over to a "pull" executor which will only require a small part of that change.

--paul


Michael Alexeev

unread,
Apr 9, 2012, 9:50:43 PM4/9/12
to voltd...@googlegroups.com
Hi Paul,

Sorry, I was out of town for the whole week and couldn't replay you earlier.
On Tue, Apr 3, 2012 at 9:14 PM, Paul Martel <pma...@voltdb.com> wrote:
HI, Mike.

On Thu, Mar 29, 2012 at 9:11 PM, Michael Alexeev <michael...@gmail.com> wrote:
Hi All,

Thanks everyone for the very valuable input! It seems that the consensus is that the top-down (or pull) solution to drive data through the executor's chain is preferable to the inline node approach . It eliminates the need for the multiple intermediate input/output tables for almost all nodes in the graph with the exception of input table(s) for leaf(s) and an output table for the root. The existing inline nodes also can be eliminated since their primary purpose is to bypass intermediate in/out tables which will be gone anyway. Is it more or less accurate?

Below is my attempt to refactor the existing logic based on the Paul's and Ariel's suggestions:

1. factor out the behavior of the (per input row) loop body in Executor::p_execute -- call it Executor::exec_one
   Easy for some executors, not so for others - nested loop ones, for example

Yes. The degree of difficulty is closely related to how much state p_execute is currently keeping in the call stack when it is ready to "insert the next row". In the simplest pull model "insert the next row" translates to something like "return the row, BUT remember how to get back here to find the next one". So lots of local variables need to be preserved as Executor data members. More complex Executors like joins, especially outer joins, that operate in phases will operate like finite state machines.

Agree. Either we need to preserve the state between the calls or pass it as additional parameters or both. At the moment I don't feel what is the right solution but hope to get a better idea after some prototyping.


2. Modify p_execute method for each executor to do depth first traversal to iterate over the input table(s) and apply node specific logic to each tuple on its way up to the root. I called it p_next
   Bool Executor::p_next( const NValueArray &params, TableTuple &out)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         // @TODO Somehow need to preserve the input_iterator state between the calls
         hasNext = input_iterator.next(out);
    }
    else
    {
        // Recurs to the children. The actual code could vary from executor to executor
        // but the executor specific behavior could be factored out via virtual or templated call.
        // The default one could be as simple as

        hasNext = children[0]->getExecutor()->p_next(out, params);
    }

    // apply node specific logic to process the tuple.
    if (hasNext)
          out = executor->execute_one(out, params);
   return hasNext;
}

This is a good start, but unfortunately a little too simple for SOME nodes -- distincts, some joins, early stages of orderby or aggregates, will need to pull several tuples from the child with (possibly) no result (yet), while other nodes (later stages of orderby, aggregates, some joins, will be able to produce a multitude of tuples after one or zero consultations to the child.

Yes, it's the simplest example. For example, execute_one signature should be updated to operate on multiple input nodes to be able to handle joins. Are you saying that it is also possible to have multiple output tuples as well? My plan was to start with something simple (like seqscan) and than increase the complexity of the executors for example, distinct, nested join).


3. The Executor::p_execute becomes
  Executor::p_execute(const NValueArray &params)
{
    // Do some initial stuff

    TableTuple tuple;
    while(this->p_next(tuple, params))
    {
        output_table->insertTuple(tuple);
    }
    // clean-up if necessary
}

I think that this is slightly blurring a few different functions that need to be separated in a pull system.
There is the "execute" call that, as you point out, can get called once at the top of the stack.
But if p_execute is supposed to be implementing that call, then where do other nodes than the top one get to "// Do some initial stuff " and later "clean up if necessary"? It seems like that code should be factored out and called on each node prior to launching the executor loop at the top. 

Good point. I missed that. My immediate reaction is to chain existing p_init call to initialize executors and add new method p_cleanup to do tear down, but there may be a better solution.

I want to be a little forward-looking here and attempt to anticipate support for at least some simple forms of sub-query and possibly some fancier joins.  In my experience, this requires another entry point which allows an execution loop to be restarted mid-query possibly with different "input parameters".

Could please you provide some examples?

4. Modify int VoltDBEngine::executeQuery:  replace loop over the executor stack with a single execute call to a top executor

5. Modify Planner not to inline Projection Node within a parent node (or any other node which currently gets inlined) but make it a child of that node. That way tuple will be materialized at the very begining of the executor's chain.
The projection nodes currently start off as parents of scans, etc. that get inlined into the scans. Are you saying "keep the projection nodes there as parents" or are you saying something else I don't understand? 

Sorry, I meant to push projection node all the way down to be a leaf in the access path instead of the root.
This way, it will be the first executor to evaluate the next tuple from the target table.

I'm not sure what "materialized" means in this context -- by my definition, constructing a mutated copy of a table, I wouldn't say that materialization is a natural result of projection.

I probably used the wrong term. I was referring to the process of applying projection expression to the original tuple.

                    for (int ctr = 0; ctr < num_of_columns; ctr++)
                    {
                        NValue value =
                            projection_node->
                          getOutputColumnExpressions()[ctr]->eval(&tuple, NULL);
                        temp_tuple.setNValue(ctr, value);
                    }



One nagging question is how to preserve the state of the iterator over the imput table at the leaf node between the p_next calls. One possible solution is to make the iterator a data member of the corresponding node but it makes the node stateful which doesn't sound right. The other possibility is to add one more parameter to the p_next call (map of node pointer, input iterator pairs) and pass it up and down the call stack. At the leaf level executor would look for iterator in the map first and if not there (first time call) insert it first. Something like

   Bool Executor::p_next(const NValueArray &params, TableTuple &out, std::map<AbstractPlanNode*,TableIterator>& iterators)
{
   bool hasNext = true;
   vector<AbstractPlanNode*>& children = node->getChildren();
   if (children.empty())
    {
         // This is a leaf. Get the next tuple from the target table
         if (iterators.find(node) == iterators.end())
             iterators.insert(std::make_pair(node, node->getTargetTable()->iterator());
         TableIterator iterator = iterators[node];
         hasNext = iterator.next(out);
    }
// the rest is same
....
}


In a way, Executors have always been stateful, it's just that the (temp table) format that they use to hold their state is not as dynamic or efficient as we'd like. The dynamic state of the Executors is one of the main things that distinguishes them from their corresponding PlanNodes. Your pseudo-code seems to be confusing these two abstractions. I'm not seeing the downside of stateful executors. They are single-threaded and dedicated to a particular query execution.

That is what I suspected but wanted to confirm.
 
I could envision the same PlanNode trees being recycled from one query execution to another -- though I suspect (I could be wrong) that we currently always re-generate even these on-the-fly per query executon from their JSON descriptions -- but I don't see the point in also recycling executor trees or keeping their associated dynamic states separated out. It would be better to extend the Executors to contain whatever dynamic state they need than to invent yet another hierarchy of classes to track Executor-specific dynamic state. I would rather that Executors BE abstract iterators than HAVE iterators.

That's exactly what Executor::p_next is about- provide means to iterate over the target table(s). There is no need for a parallel structure. The table iterators can be kept as data members of the executors corresponding to the leaf nodes instead of passing them up and down.
 

Of cause, this is a sudo code only and more details need to be figured out. But do you think this is the step in the right direction and it's worth moving forward? Comments, suggestions are very welcome.

I think that this basic approach is sound and worth pursuing, perhaps initially prototyping it on a git branch? There are clearly a number of details that have to be worked out, especially for a completely general solution covering all types of plannodes. I personally find this an interesting project with lots of potential benefits to the product. As I come up to speed here at VoltDB, I'm hoping that I can strike a balance between supporting you in this effort, helping with a few of the identified high priority tasks in the team's backlog, and possibly branching out in some new directions of my own.

I would gladly start prototyping. I will definitely need all support I can get and more :) To begin with, I am not sure whether I have right privileges to create a branch. Also what would be a representative set of nodes/executors to begin with?


BTW, I actually lost track for a bit of the fact that your original non-inlined DISTINCT push-down has not been integrated. Do you have a clean patch with just that fix that might be ready for integration? There's some interest in getting that into the next iteration of development.

Just sent it in a separate e-mail

WRT the inlined distinct, I can see both sides of the argument for trying to integrate it vs. leaving it out in favor of the more general approach of switching over to a "pull" executor which will only require a small part of that change.

Distinct alone won't solve all the problems. GROUP BY also needs to be inlined for the same reason. I think the 'pull' approach is a superior solution is we can make it work.

Mike

--paul



Paul Martel

unread,
Apr 10, 2012, 11:56:31 PM4/10/12
to voltd...@googlegroups.com
On Mon, Apr 9, 2012 at 9:50 PM, Michael Alexeev <michael...@gmail.com> wrote:
Agree. Either we need to preserve the state between the calls or pass it as additional parameters or both. At the moment I don't feel what is the right solution but hope to get a better idea after some prototyping.

I'm leaning towards a minimum of parameter passing, but I expect that prototyping will make this clear.

Yes, it's the simplest example. For example, execute_one signature should be updated to operate on multiple input nodes to be able to handle joins. Are you saying that it is also possible to have multiple output tuples as well? My plan was to start with something simple (like seqscan) and than increase the complexity of the executors for example, distinct, nested join).

I didn't mean to imply that a node had multiple output tuples in the sense of supporting multiple "output" iterators -- I have had some thoughts and discussions of such an architecture, but was fortunate never to have had to implement it.  My meaning was just that the number of iterations of the output iterator generally has a many-to-many relationship with the number of iterations of the underlying input iterator(s). On the other hand, join executors (and I think I MIGHT one day argue aggregate executors, as well) DO operate on multiple input iterators, but their relationship to their input iterators tends not to be very symmetric.  As I see it, even a join node has one "main" input iterator just like other executor nodes. As an "implementation detail", it internally uses another iterator in its specialized processing (deciding just like any other executor if/how results from its "main" input iterator translate into one or more immediate or later results to its output. But it tends to use this other iterator very differently from its main one, requiring a broader API not unlike the API to the top level executor or (some day?) to the top level executor of a subquery.

My immediate reaction is to chain existing p_init call to initialize executors and add new method p_cleanup to do tear down, but there may be a better solution.

This is the right general idea.  There are some subtleties here around keeping the nodes loosely coupled, deciding when to use iteration and when recursion, and when to use generic programming, and when to use object specialization, but I think that will work itself out as we go forward.


I want to be a little forward-looking here and attempt to anticipate support for at least some simple forms of sub-query and possibly some fancier joins.  In my experience, this requires another entry point which allows an execution loop to be restarted mid-query possibly with different "input parameters".

Could please you provide some examples?

This relates to the broader API I was referring to for a join's second "inner loop" input iterator. One of the jobs of the current p_execute function is the "binding of parameters" which means that it must (potentially) substitute actual contant values for any place-holders that might exist in its internal expressions. In the normal case, these place-holders represent the "?" parameters in the original query statement that get bound to actual arguments just before query execution. For example, such "?" parameters might represent the start and end key values in an index scan. Obviously, these substitutions only need to be processed once -- let's say that will happen in p_init (even though that's not where it happens now) -- and not repeated every time we fetch a new record from the index.

Now consider the case of a NestLoopIndexScan. In this case, the range of keys (often an exact key value) is typically determined not by the value specified for a "?" parameter but rather by each value (in turn) from some matching column pulled from the main "outer loop" input iterator. Note the similarities and differences with a normal IndexScan based on a "?" parameter.
  -- similarity - the parameter must be "bound" before index scan begins.
  -- similarity - the parameter remains in effect over potentially several iterations of matching tuples from the index scan
  -- similarity - the parameter remains in effect until the index scan is complete -- no more tuples match
  -- difference - when no more tuples match, we may not be finished yet. If there is another row, the index scan needs to be restarted but now with a different matching column value.
  -- similarity - that's ALMOST just like finishing and restarting a stand-alone index scan query with a potentially different argument value passed in for the "?" parameter
  -- difference - but it's NOT QUITE so involved. Most of the setup that happens in p_init (and the breakdown in p_cleanup) does not really need to be repeated, just the final steps of (re)binding the parameter and (re)starting the index iteration.

In practical terms, I am talking about simplifying join executor code by reducing redundant code between NestLoop scans and normal parameterized stand-alone scans. I am also paving the way for "correlated subquery scans" -- in which there really is the potential for a full-blown query executor tree in the middle of some other scan that needs to be re-run on each "outer" iteration with different parameter bindings. The beauty of the approach is that it practically comes for free simply from splitting later "init" steps and earlier "cleanup" steps into their own functions so they can alternatively be called back-to-back as a kind of "rebind/reset" step.
 
Sorry, I meant to push projection node all the way down to be a leaf in the access path instead of the root.
This way, it will be the first executor to evaluate the next tuple from the target table.

Yes, this can sometimes be a good thing, but I think that this may want to be decided on more of a case-by-case basis. In the simplest case, projection eliminates columns, narrowing the rows, but until we actually have to transmit those rows, there may not be any great benefit. Also consider that projection can sometimes complicate and even widen the representation of intermediate tuples. It may be lighter/faster to retain a series of tuple addresses and a constant "thunk" that represents "1st and 3rd columns and the sum and difference of columns 9 and 10" for evaluation later on demand rather than copying/storing the result of that processing up front.
 
I would gladly start prototyping. I will definitely need all support I can get and more :) To begin with, I am not sure whether I have right privileges to create a branch. Also what would be a representative set of nodes/executors to begin with?

I'm not sure whether branching can be part of a pull request. But no problem, if you want to get started on master, I'll set up a branch tomorrow. I think that once a branch is established it should be easy enough to pull to them.

I've got kind of a radical idea of how we could support an ever-expanding prototype on the branch without breaking the universe.
My idea is basically that we start by layering the new function protocol ON TOP OF the existing one, being careful to use new function names on the existing classes AND provide a fallback implementation for all nodes by means of shim methods on AbstractExecutor.  That is, the top-level code is changed to follow whatever row-level pull protocol we want BUT rather than a pure virtual implementation on AbstractExecutor, we provide, for now, a fully functional worst-case implementation leveraging the existing implementation and ignoring the new implementation (for now, even on other executors) as much as possible.
Something like:
    tuple p_next() {
        if ( ! m_forNow_outputTableIteratorIsInitialized) {
           // relies on some forNow method identifying not-yet-inited children of each node,
           // possibly based on some (forNow?) data member(s) on AbstractExecutor
            p_forNow_p_init_and_execute_my_children_then_me(...);
            m_forNow_iterator = getIteratorOver(m_outputTable);
        }
        return m_forNow_iterator.next();
    }

Without any further specialization, that should give a working system that satisfies the new pull protocol (if only in fact, not in spirit) and operates only slightly worse than the current one.
From there, things can only get better as you refine p_next and the rest of the new protocol functions for whatever executors you choose, looking to broaden the set of use cases re-implemented executor by executor from the top down that don't call the old p_init and p_execute or refer to the "forNow" members.

Maybe you'd want to try to get a single-partition seqscan working (without breaking any existing methods) to establish the protocol baseline before setting up this forNow scaffolding and generic backfill implementation on AbstractExecutor.

I'm not sure what makes sense for an executor by executor plan of attack, but I like the idea of being able to run regression suites as the reimplementation evolves.
--paul

Michael Alexeev

unread,
Apr 11, 2012, 10:45:28 PM4/11/12
to voltd...@googlegroups.com
Hi Paul,

Thanks for the elaborate answer.

I think I got the general idea but got lost in the naming convention :). Need time to digest it.
 
Maybe you'd want to try to get a single-partition seqscan working (without breaking any existing methods) to establish the protocol baseline before setting up this forNow scaffolding and generic backfill implementation on AbstractExecutor.
 
I am still fuzzy on your proposal how to marry push and pull approaches and not to break anything. Maybe I will start in my local area with seqscan (including projection and limit executotrs to eliminate inlined nodes) building pull methods/data members in parallel to existing ones and modify VoltDBEngine::executeQuery to call new pull_execute method and test it on simple selects just to get a feeling of it. If it would work then come back and merge pull/push functionality together the way you've suggested and continue with the other executors.

Does it sound reasonable to you?

Mike 

Paul Martel

unread,
Apr 12, 2012, 8:14:01 AM4/12/12
to voltd...@googlegroups.com
Mike,
We're agreed on the approach. I started a pullexec branch on github (with no commits to it yet) so we can work cooperatively on this.
I'm still not sure exactly how to target your updates to it -- I don't know whether the target branch is encoded into the "pull request" or if the branch must be selected when the pull request is accepted. Either way, you'll want to get in the habit of checking out from the pullexec branch for this project.

Let me know any time if I can be of any further help on this, or if you ever want to take a break, possibly to work on one of the other tasks like the ones we discussed. It seems like we may have similar interests in assignments. With a little coordination, I expect we could accomplish much as a team and avoid duplicating efforts.

Thanks again for the good work. I'm looking forward to the next development.
--paul
Reply all
Reply to author
Forward
0 new messages