For the pull prototype I need to disable the optimization inlining
Projection node within the scan one. The access plan for a simple
select like 'select c from t' where t is a replicated table should
look like
Send
Projection
SeqScan
I commented out the IF branch making Projection node inline (lines
802-805 PlanAssemler::addProjection) and it seems to be working. I can
see in the debugger that the best plan (QueryPlanner::compilePlan)
consists of three nodes Send, Projection, and SeqScan and SeqScan
doesn't have inlined Projection node. But what puzzles me is that JSON
output shows the Projection node in two places (if I read it right)
1. Child of Send node
2. Inline node of SeqScan
SQL: SELECT lgn_name FROM log ;
COST: 3000000.0
PLAN:
{
"EXECUTE_LIST": [
7,
9,
10
],
"PARAMETERS": [],
"PLAN_NODES": [
{
"CHILDREN_IDS": [9],
"ID": 10,
"INLINE_NODES": [],
"OUTPUT_SCHEMA": [{
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_NAME": "LGN_NAME",
"EXPRESSION": {
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_IDX": 0,
"COLUMN_NAME": "LGN_NAME",
"TABLE_NAME": "LOG",
"TYPE": "VALUE_TUPLE",
"VALUE_SIZE": 100,
"VALUE_TYPE": "STRING"
},
"SIZE": 100,
"TABLE_NAME": "LOG",
"TYPE": "STRING"
}],
"PARENT_IDS": [],
"PLAN_NODE_TYPE": "SEND"
},
{
"CHILDREN_IDS": [7],
"ID": 9,
"INLINE_NODES": [],
"OUTPUT_SCHEMA": [{
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_NAME": "LGN_NAME",
"EXPRESSION": {
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_IDX": 0,
"COLUMN_NAME": "LGN_NAME",
"TABLE_NAME": "LOG",
"TYPE": "VALUE_TUPLE",
"VALUE_SIZE": 100,
"VALUE_TYPE": "STRING"
},
"SIZE": 100,
"TABLE_NAME": "LOG",
"TYPE": "STRING"
}],
"PARENT_IDS": [10],
"PLAN_NODE_TYPE": "PROJECTION"
},
{
"CHILDREN_IDS": [],
"ID": 7,
"INLINE_NODES": [{
"CHILDREN_IDS": [],
"ID": 8,
"INLINE_NODES": [],
"OUTPUT_SCHEMA": [{
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_NAME": "LGN_NAME",
"EXPRESSION": {
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_IDX": 1,
"COLUMN_NAME": "LGN_NAME",
"TABLE_NAME": "LOG",
"TYPE": "VALUE_TUPLE",
"VALUE_SIZE": 100,
"VALUE_TYPE": "STRING"
},
"SIZE": 100,
"TABLE_NAME": "LOG",
"TYPE": "STRING"
}],
"PARENT_IDS": [],
"PLAN_NODE_TYPE": "PROJECTION"
}],
"OUTPUT_SCHEMA": [{
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_NAME": "LGN_NAME",
"EXPRESSION": {
"COLUMN_ALIAS": "LGN_NAME",
"COLUMN_IDX": 1,
"COLUMN_NAME": "LGN_NAME",
"TABLE_NAME": "LOG",
"TYPE": "VALUE_TUPLE",
"VALUE_SIZE": 100,
"VALUE_TYPE": "STRING"
},
"SIZE": 100,
"TABLE_NAME": "LOG",
"TYPE": "STRING"
}],
"PARENT_IDS": [9],
"PLAN_NODE_TYPE": "SEQSCAN",
"PREDICATE": null,
"TARGET_TABLE_NAME": "LOG"
}
]
}
Also WINNER-0.TXT doesn't show Projection node in the tree:
RETURN RESULTS TO STORED PROCEDURE
SEQUENTIAL SCAN of "LOG"
Are there other places which needs to be disabled to prevent inlining?
The other question I have is related to the EE side. Assuming that
access plan is indeed
SEND
PROJECTION
SCAN
And select is 'select lgn_name from log' where lgn_name is the second
column in the LOG table the output schema in the projectionexecutor
should consist of one column (LGN_NAME) and its index should be 1,
right?
bool ProjectionExecutor::p_init(AbstractPlanNode *abstractNode,
TempTableLimits* limits)
{
......
//
// Construct the output table
//
TupleSchema* schema = node->generateTupleSchema(true);
m_columnCount = static_cast<int>(node->getOutputSchema().size());
....
// initialize local variables
all_tuple_array_ptr =
expressionutil::convertIfAllTupleValues(node->getOutputColumnExpressions());
all_tuple_array = all_tuple_array_ptr.get();
.....
}
After the initialization all_tuple_array should have a single element
of 1. The schema of the output table (and tuple returned from the
iterator loop) of the seqscanexecutor should match the target (LOG)
table schema. Is it more or less accurate?
Thanks,
Mike
The other question I have is related to the EE side. Assuming that
access plan is indeed
SEND
PROJECTION
SCAN
And select is 'select lgn_name from log' where lgn_name is the second
column in the LOG table the output schema in the projectionexecutor
should consist of one column (LGN_NAME) and its index should be 1,
right?
bool ProjectionExecutor::p_init(AbstractPlanNode *abstractNode,
TempTableLimits* limits)
{
......
//
// Construct the output table
//
TupleSchema* schema = node->generateTupleSchema(true);
m_columnCount = static_cast<int>(node->getOutputSchema().size());
....
// initialize local variables
all_tuple_array_ptr =
expressionutil::convertIfAllTupleValues(node->getOutputColumnExpressions());
all_tuple_array = all_tuple_array_ptr.get();
.....
}
After the initialization all_tuple_array should have a single element
of 1. The schema of the output table (and tuple returned from the
iterator loop) of the seqscanexecutor should match the target (LOG)
table schema. Is it more or less accurate?
Thanks,
Mike
> It seems like you are running into the effects of
> AbstractPlanNode.generateOutputSchema which in the absence of an inline
> projection creates one based on the node's m_tableScanSchema (if not empty).
> I'm not sure what an empty/non-empty m_tableScanSchema signifies.
> In the process, this code also, re-sorts the non-empty m_tableScanSchema
> columns into the order of their position in the m_tableSchema (?== in the
> underlying table?).
> It's not immediately clear to me how much of this processing is correct
> and/or essential in the presence of a non-inlined parent projection node if
> we eliminated the inline projection node -- it may me correct but have
> implications (e.g. effect column index number settings) on the parent
> projection node. More below.
This is it! I was running into the problem that projection node
indexes were messed up! The planner resolves them based on the output
schema of the send node without inlined projection (the full target
table) but then the scan schema gets reset and the top projection node
indexes are out of sync. I will probably hack something quick and
dirty just to get around it for now. If you have an idea how it should
be done, please let me know.
>
>>
>> The other question I have is related to the EE side. Assuming that
>> access plan is indeed
>> SEND
>> PROJECTION
>> SCAN
>>
>> And select is 'select lgn_name from log' where lgn_name is the second
>> column in the LOG table the output schema in the projectionexecutor
>> should consist of one column (LGN_NAME) and its index should be 1,
>> right?
>>
> This SOUNDS reasonable, assuming that index "0" is valid and has no special
> meaning.
"0" is a valid index of the first column.
>
>> bool ProjectionExecutor::p_init(AbstractPlanNode *abstractNode,
>> TempTableLimits* limits)
>> {
>> ......
>> //
>> // Construct the output table
>> //
>> TupleSchema* schema = node->generateTupleSchema(true);
>> m_columnCount = static_cast<int>(node->getOutputSchema().size());
>> ....
>> // initialize local variables
>> all_tuple_array_ptr =
>>
>> expressionutil::convertIfAllTupleValues(node->getOutputColumnExpressions());
>> all_tuple_array = all_tuple_array_ptr.get();
>> .....
>> }
>>
>> After the initialization all_tuple_array should have a single element
>> of 1. The schema of the output table (and tuple returned from the
>> iterator loop) of the seqscanexecutor should match the target (LOG)
>> table schema. Is it more or less accurate?
>
> This is what I would expect, which suggests that the m_tableScanSchema
> should not be allowed to effect the outputschema at all when we eliminate
> the inline projection.
Right!
>
>>
>> Thanks,
>> Mike
>
>
> I have been deferring my reply to your previous message, expecting a post to
> your github repo that would allow a more concrete discussion.
> I'm not looking for "finished product" in any sense, just a basis for
> discussion.
> I think a focus on this one simple query is a good starting point.
> Let me know if you have more immediate questions or if you think this is not
> a good plan.
I want to resolve this schema problem (if I can) and then post to have
something presentable. On the other hand, it shouldn't affect the
executor side at all. i will commit the prototype over the weekend
than.
Mike
Thanks f
>
> --paul
>
> I have been deferring my reply to your previous message, expecting a post to
> your github repo that would allow a more concrete discussion.
> I'm not looking for "finished product" in any sense, just a basis for
> discussion.
> I think a focus on this one simple query is a good starting point.
> Let me know if you have more immediate questions or if you think this is not
> a good plan.
>
> --paul
>
I just pushed the prototype. I accidentally used 'git commit -a' and
few 'extra' files were added:
bin/voltcompiler
src/frontend/org/voltdb/planner/QueryPlanner.java
Don't pay attention to them.
I haven't figure out SeqScanPlanNode schema problem yet.
Looking forward to your comments.
Mike