ENG-257

10 views
Skip to first unread message

Michael Alexeev

unread,
Jan 31, 2012, 11:11:21 PM1/31/12
to voltd...@googlegroups.com
Hi All,

I took a brief look at the ENG-257 stating that the following select fails to compile.

select T_D1, 1 from T;

The corresponding XML is
<statement>
    <select>
        <scan_columns>
            <columnref id="1927624031" table="T" column="T_D1" alias="T_D1" />
        </scan_columns>
        <columns>
            <columnref id="1927624031" table="T" column="T_D1" alias="T_D1" />
            <value id="638790375" type="INTEGER" value="1" />
        </columns>
        <parameters>
        </parameters>
        <tablescans>
            <tablescan type="sequential" table="T">
            </tablescan>
        </tablescans>
    </select>
</statement>

The reason for that is that constant gets associated with a temp table "VOLT_TEMP_TABLE" because corresponding node type is "value" and  not "columnref" (org.voltdb.planner.ParsedSelectStmt::parseDisplayColumns). Also column name and alias are set to empty string which triggers an exception  in SchemaColumn.matches later on. To get around it I added an extra "else if" branch to handle VOLT_TEMP_TABLE case. In case of TEMP table if column name and alias don't match or empty still return TRUE to handle VALUE node.

--- SchemaColumn.java    2012-01-31 22:57:40.000000000 -0500
+++ SchemaColumn_new.java    2012-01-31 22:57:09.000000000 -0500
@@ -168,6 +168,10 @@ public class SchemaColumn
                     retval = true;
                 }
             }
+            else if (tableName.equals("VOLT_TEMP_TABLE"))
+            {
+                retval = true;
+            }
             else
             {
                 throw new RuntimeException("Attempted to match a SchemaColumn " +

My reasoning here is that there is actually nothing to match in this scenario so match can simply return TRUE.

With this change and a small change in ParsedSelectStmt.java (node "value" doesn't have attribute "alias")
--- ParsedSelectStmt.java    2012-01-31 23:00:09.000000000 -0500
+++ ParsedSelectStmt_new.java    2012-01-31 22:59:37.000000000 -0500
@@ -104,7 +104,11 @@ public class ParsedSelectStmt extends Ab
             ExpressionUtil.assignLiteralConstantTypesRecursively(col.expression);
             ExpressionUtil.assignOutputValueTypesRecursively(col.expression);
             assert(col.expression != null);
-            col.alias = child.getAttributes().getNamedItem("alias").getNodeValue();
+            Node alias = child.getAttributes().getNamedItem("alias");
+            if (alias != null)
+            {
+                col.alias = alias.getNodeValue();
+            }
 
             if (nodeName.equals("columnref")) {
                 col.columnName =

the original select seems to compile without an error. There are still some problems with select involving aggregate functions like

select count(*), 1 from T;

where ConstantValueExpression is cast to TupleValueExpression at at org.voltdb.plannodes.AggregatePlanNode.resolveColumnIndexes(AggregatePlanNode.java:119) but I haven't chance to get to it yet.

i am just curios, am I on the right track?

Mike

Ryan Betts

unread,
Feb 1, 2012, 9:07:26 AM2/1/12
to voltd...@googlegroups.com
Mike,

Interesting. I would expect to see a resulting plan that has a
ConstantValueExpression in a projection. That sounds like what you're
describing.

It is unfortunate that the constant ends up associated with a
non-existent temp table. That happen via this parseDisplayColumns
code?

{
// XXX hacky, assume all non-column refs come from a temp table
col.tableName = "VOLT_TEMP_TABLE";
col.columnName = "";
}

Perhaps someday we will replace .tableName and .columnName references
with accessors on expressions. I think that would be more correct and
more flexible. But a really sprawling change...

Beyond the planner, I wouldn't be surprised if the execution engine
needs a little work to support mapping projection expressions that
don't contain either a tuplevalueexpression or
parametervalueexpression in them. Or maybe that just works...

Watching the progress you're are making on these issues is a lot fun.
I hope you're enjoying Volt.

Thanks,
Ryan.

mike alexeev

unread,
Feb 1, 2012, 1:05:43 PM2/1/12
to voltd...@googlegroups.com
Hi Ryan,

----- Original Message -----
From: "Ryan Betts" <rbe...@voltdb.com>
To: <voltd...@googlegroups.com>
Sent: Wednesday, February 01, 2012 9:07 AM
Subject: Re: [VoltDB-dev] ENG-257


> Mike,
>
> Interesting. I would expect to see a resulting plan that has a
> ConstantValueExpression in a projection. That sounds like what you're
> describing.

Yes, you are absolutely correct. parseExpressionTree builds
ConstantValueExpression for the 'value' node.


>
> It is unfortunate that the constant ends up associated with a
> non-existent temp table. That happen via this parseDisplayColumns
> code?
>
> {
> // XXX hacky, assume all non-column refs come from a temp table
> col.tableName = "VOLT_TEMP_TABLE";
> col.columnName = "";
> }
>

Yes again.

> Perhaps someday we will replace .tableName and .columnName references
> with accessors on expressions. I think that would be more correct and
> more flexible. But a really sprawling change...
>
> Beyond the planner, I wouldn't be surprised if the execution engine
> needs a little work to support mapping projection expressions that
> don't contain either a tuplevalueexpression or
> parametervalueexpression in them. Or maybe that just works...

It actually almost does. I built a small client to compile and execute a
simple select
select 1, 'literal' from some_table;

and it worked :)
But as I already mentioned, there is still a problem if aggreagte
function(s) involved. The modified select fails to compile
select count(*), 1 from some_table;

on the ground that AggreagtePlanNode excpects all expressions to be of
TupleValueExpression type. Is it a must? Can a Constant expresion be simply
skipped during the AggreagtePlanNode::resolveColumnIndexes call? It may
still cause problem(s) down the road...

>
> Watching the progress you're are making on these issues is a lot fun.
> I hope you're enjoying Volt.

Oh yes, I enjoy it big time! Prior to joing Volt I was involved in another
in-memory database - Blackray. It's very intetesting to see a totally
different approach to the same problem. It supported much smaller subset of
SQL capabilities (no views, constraints, limited joins, etc.) but any DDL,
DML were parsed and executed dynamically.

Thanks,
Mike
>
> Thanks,
> Ryan.

Ryan Betts

unread,
Feb 1, 2012, 8:14:57 PM2/1/12
to voltd...@googlegroups.com
on the ground that  AggreagtePlanNode excpects all expressions to be of 
TupleValueExpression type. Is it a must? Can a Constant expresion be simply 
skipped during the AggreagtePlanNode::resolveColumnIndexes call? It may 
still cause problem(s) down the road...

I can't think of any reason that Aggregate would require a TVE. 

Blackray

Did Blackray spin out of Deutsche Telekom?  I think I met a couple of their engineers at a conference awhile ago. They were really friendly - we had a nice chat.

Re: testing / trying some different literal/constant selections, you might look at one of the regressionsuites - for example: ./tests/frontend/org/voltdb/regressionsuites/TestSQLFeaturesSuite.java

All of the regressionsuites tests are blackbox integration style tests that spin up a voltdb database using one OS process per node. They make it easy, in particular, to test or try specific SQL statements with a lot less overhead than writing a client, project file, deployment file, etc.

You can add "statement procedures" programmatically without having to define a java stored procedure. Or submit AdHoc SQL.  (A statement procedure is just a single sql statement that auto-commits w/o any Java.) 

Ryan. 

Michael Alexeev

unread,
Feb 1, 2012, 11:16:45 PM2/1/12
to voltd...@googlegroups.com
Hi Ryan,

On Wed, Feb 1, 2012 at 8:14 PM, Ryan Betts <rbe...@gmail.com> wrote:
on the ground that  AggreagtePlanNode excpects all expressions to be of 
TupleValueExpression type. Is it a must? Can a Constant expresion be simply 
skipped during the AggreagtePlanNode::resolveColumnIndexes call? It may 
still cause problem(s) down the road...

I can't think of any reason that Aggregate would require a TVE. 

consider two selects
select c, 1 from t;
select max(c), 1 from t;

In the former case the query plan is  SeqScanPlanNode. It uses parent's resolveColumnIndexes() implementation from AbstractScanPlanNode which ultimately uses ProjectionPlanNode.resolveColumnIndexesUsingSchema to resolve column indexes. Its m_outputSchema in this case contains two columns - one for c with the TVE and another one for const 1.
During this call not TVE expressions are simply ignored and resultant output schema contains only TVE ones. SeqScanPlanNode.m_outputSchema is then reset from the projection node one.

In the latter case, the query plan is the Aggregate one and it has its own implementation for index resolution. The difference here is that somehow the  projectionPlanNode's m_outputSchema now contains only one column for the aggregate column but the m_outputSchema from the aggregate plan two. The second one is still a Const expression which causes all the trouble. I modified resolveColumnIndexes() to ignore all non-TVE types like in case of SeqScanNode and it was enough to compile the statement.

But... There are many more Plans and I feel it will brake somewhere else as well. If const node should be ignored (and it makes sense since there is no corresponding column) it should be done in one place somewhere upfront rather then in every concrete plan implementation.  What do you think? The problem is I am at the moment overwhelmed by the number of different plans and there interaction so I can't figure out where it should be done.



Blackray

Did Blackray spin out of Deutsche Telekom?  I think I met a couple of their engineers at a conference awhile ago. They were really friendly - we had a nice chat.

Yes, it started as a search engine for Deutsch Telecom yellow pages. Then the code was donated to open source and evolved into in-memory database. Was it Felix or Thomas? I worked with them a lot.

Re: testing / trying some different literal/constant selections, you might look at one of the regressionsuites - for example: ./tests/frontend/org/voltdb/regressionsuites/TestSQLFeaturesSuite.java

All of the regressionsuites tests are blackbox integration style tests that spin up a voltdb database using one OS process per node. They make it easy, in particular, to test or try specific SQL statements with a lot less overhead than writing a client, project file, deployment file, etc.

You can add "statement procedures" programmatically without having to define a java stored procedure. Or submit AdHoc SQL.  (A statement procedure is just a single sql statement that auto-commits w/o any Java.) 

Thanks for the tip!

Thanks,
Mike

Ryan. 

Michael Alexeev

unread,
Feb 13, 2012, 9:40:11 PM2/13/12
to voltd...@googlegroups.com
Hi Ryan,

On Wed, Feb 1, 2012 at 8:14 PM, Ryan Betts <rbe...@gmail.com> wrote:
on the ground that  AggreagtePlanNode excpects all expressions to be of 
TupleValueExpression type. Is it a must? Can a Constant expresion be simply 
skipped during the AggreagtePlanNode::resolveColumnIndexes call? It may 
still cause problem(s) down the road...

I can't think of any reason that Aggregate would require a TVE. 



Since it doesn't make sense to resolve column indexes for Constant nodes  I modified AggregatePlanNode::resolveColumnIndexes to follow ProjectionPlanNode logic:
1. Collect all TVE from the m_outputSchema
2. Convert them to the TVE using ExpressionUtil.getTupleValueExpressions. The non-TVE expressions are simply filtered out than
3. Continue with the existing functionality

After these changes the following select seems to compile

select max(C1), 5 from T;

but I am not sure
 - whether it's the right way to go
 - are there any other cases with other types of plan nodes which also require similar modifications.

Any advice here?

Below is the diff between new and old versions of the AggreagtePlanNode.java
mike@ubuntu:~/voltdb/voltdb/src/frontend/org/voltdb/plannodes$ diff -rup AggregatePlanNode.java AggregatePlanNode_old.java
--- AggregatePlanNode.java    2012-02-13 21:38:16.000000000 -0500
+++ AggregatePlanNode_old.java    2012-02-13 21:19:25.000000000 -0500
@@ -112,16 +112,11 @@ public class AggregatePlanNode extends A
         assert(m_children.size() == 1);
         m_children.get(0).resolveColumnIndexes();
         NodeSchema input_schema = m_children.get(0).getOutputSchema();
-       
-        // get all the TVEs in the output columns
-        List<TupleValueExpression> output_tves =
-            new ArrayList<TupleValueExpression>();
         for (SchemaColumn col : m_outputSchema.getColumns())
         {
-            output_tves.addAll(ExpressionUtil.getTupleValueExpressions(col.getExpression()));
-        }
-        for (TupleValueExpression tve : output_tves)
-        {
+            // At this point, they'd better all be TVEs.
+            assert(col.getExpression() instanceof TupleValueExpression);
+            TupleValueExpression tve = (TupleValueExpression)col.getExpression();
             int index = input_schema.getIndexOfTve(tve);
             if (index == -1)
             {
@@ -129,8 +124,8 @@ public class AggregatePlanNode extends A
                 // XXX SHOULD MODE THIS STRING TO A STATIC DEF SOMEWHERE
                 if (!tve.getTableName().equals("VOLT_TEMP_TABLE"))
                 {
-                    throw new RuntimeException("Unable to find index for column: " +
-                                               tve.getColumnName());
+                    throw new RuntimeException("Unable to find index for column: " +
+                                               col.toString());
                 }
             }
             else


Thanks,
Mike

John Hugg

unread,
Feb 15, 2012, 5:51:28 PM2/15/12
to voltd...@googlegroups.com
Hi Michael,

Sorry for the slow response, but I think at this point, you're a lot more familiar with the problem and code in question than we are. I think you're on the right track, and I'm not too worried about "down the road" issues. We'll make sure we have test coverage on the new stuff and that our old tests don't break. That's good enough for us. When you have something ready, send us a diff file and one of us will poke around.

Thanks again for all your help.
-John

Michael Alexeev

unread,
Feb 16, 2012, 10:08:09 PM2/16/12
to voltd...@googlegroups.com
Hi John,
I prepared two patches (well, three) -one with the source code modifications, the next one with the test driver and the final one to fix a spelling error in the test driver file name. From the first one, 0022-ISSUE-257, please disregard changes to the org/voltdb/planner/ParsedSelectStmt.java. They became obsolete because of the previously committed changes.

I am a little bit concern with Ryan's comment about execution engine not being able to support generated plans. I did some testing with simple selects and got expected results but I am sure I haven't covered all the scenarios. Also I couldn't find an example of unit test for that to create something similar for this case.

Mike
0025-ISSUE-257.patch
0022-ISSUE-257.patch
0026-rename-TestConstants.patch

Michael Alexeev

unread,
Feb 16, 2012, 10:16:07 PM2/16/12
to voltd...@googlegroups.com
Forgot one more file
0027-Added-missing-sql-file.patch
Reply all
Reply to author
Forward
0 new messages