Is there any clue why Optiq defaults to Object[] storage of intermediate rows?

Vladimir Sitnikov

unread,

Apr 21, 2014, 7:01:59 AM4/21/14

to opti...@googlegroups.com

Hi,

In Optiq always defaults to Object[] storage (see [1]) of intermediate rows, thus it causes excessive boxing of primitives when the value is known to be NOT NULL.

This results in weird code like the following (e.g. JdbcTest.testWinAgg2):

_list2.add(new Object[] {

net.hydromatic.optiq.runtime.SqlFunctions.toInt(row[0]), // unbox-box here

net.hydromatic.optiq.runtime.SqlFunctions.toInt(row[1]), // and here :(

net.hydromatic.optiq.runtime.SqlFunctions.toFloat(row[2]),

net.hydromatic.optiq.runtime.SqlFunctions.toFloat(row[3]),

net.hydromatic.optiq.runtime.SqlFunctions.toLong(row[4]),

row[5],

row[6],

row[7],

row[8],

row[9],

w3$o0});

The thing is when we perform RexToLixTranslator.translateProjects, we do not tell the translator that we'll eventually box the values since physical row storage is Object[]. The translator finds out that row[0] is always NOT NULL and tries to use unboxed types (here toInt, toFloat, and toLong appear).

I see two way of solving the problem:

1) Use custom classes for intermediate row physical types. As a quick hack I tried to replace Object[].class with Object.class and it switches to custom class generation and it avoids boxing.

I have no idea what pressure to the permanent generation will cause those new classes. However, I believe the number of classes will double at most (Optiq generates custom Enumerable implementations already, thus custom row storage should not hurt much)

2) Pass "output physical rowtype" to the translateProjects and short-circuit there when optiq identifies the expression is just localRef and the target physical row type matches the source physical row type.

[1]: https://github.com/julianhyde/optiq/blob/master/core/src/main/java/net/hydromatic/optiq/jdbc/OptiqConnectionImpl.java#L142 , prepare.prepareSql(prepareContext, sql, null, Object[].class, maxRowCount);

Julian Hyde

unread,

Apr 21, 2014, 3:29:10 PM4/21/14

to opti...@googlegroups.com

I’ve been worried about this problem too. I agree with your analysis: the code generation tries to be efficient and convert an Object or Integer to int, but it doesn’t know that that int is going to be part of an object array. Your proposed solution #2 sounds good — or whatever variation of it causes least disruption & confusion to the code generator.

It’s also worth doing #1. If we are doing ‘GROUP BY deptno, gender’ then a ‘class Temp { int deptno; String gender; }’ would probably be a better key than ‘Object[] { Integer.valueOf(deptno), gender }’. But I’m not sure that it’s very much better.

Julian

Vladimir Sitnikov

unread,

Apr 24, 2014, 5:22:23 AM4/24/14

to opti...@googlegroups.com

I've profiled mvn test for optiq-core.

The current state of affairs is Object[] from .current() is not in the list of top contributors.

In the mean time I've filed an issue against janino: https://jira.codehaus.org/browse/JANINO-174

Julian Hyde

unread,

Apr 24, 2014, 1:26:43 PM4/24/14

to opti...@googlegroups.com

Bear in mind that object allocations (e.g. allocating Object[], but also allocating Integer, Double from boxing) may be under-reported by a profiler. They show up as greater GC load, and also slower memory accesses due to fragmentation.

The test suite tends to be small queries over small data sets, therefore dominated by optimization and other preparation costs (e.g. janino). If you are interested in larger queries, try running FoodBench https://github.com/julianhyde/share/tree/master/foodbench.

I run FoodBench, generate .csv files containing a line per query, then use Optiq to analyze those files. (Eating my own dogwood!) I’m interested in finding queries whose planning or execution time has gotten significantly better or worse.

$ head ../share/foodbench/optiq.03.csv 

ID:int,ROWS:int,TOTAL:long,PREP:long,EXEC:long

1,2,1646939000,1447087000,199852000

2,1,118903000,89740000,29163000

3,1,1013595000,41812000,971783000

4,1,52523000,37076000,15447000

5,1,461499000,408529000,52970000

6,1,66348000,29690000,36658000

7,1,61776000,21744000,40032000

8,236,1079647000,492799000,586848000

9,1,54929000,13741000,41188000

sqlline> !connect jdbc:optiq:model=foodbench.json admin admin

0: jdbc:optiq:model=foodbench.json> select * from (select * from "optiq.03" order by prep desc limit 10) order by id;

+----+------+-------+------+------+

| ID | ROWS | TOTAL | PREP | EXEC |

+----+------+-------+------+------+

| 1  | 2    | 1646939000 | 1447087000 | 199852000 |

| 26 | 23   | 1945489000 | 1480449000 | 465040000 |

| 79 | 143  | 2160628000 | 1819281000 | 341347000 |

| 119 | 39   | 4386637000 | 3488263000 | 898374000 |

| 124 | 3    | 2735263000 | 1906179000 | 829084000 |

| 147 | 12   | 2502544000 | 2007991000 | 494553000 |

| 193 | 1    | 2453153000 | 1994598000 | 458555000 |

| 194 | 13   | 2445132000 | 1968425000 | 476707000 |

| 195 | 3    | 2444807000 | 1983243000 | 461564000 |

| 196 | 13   | 2419266000 | 1980136000 | 439130000 |

Julian

--
You received this message because you are subscribed to the Google Groups "optiq-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiq-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vladimir Sitnikov

unread,

Apr 25, 2014, 4:02:12 AM4/25/14

to opti...@googlegroups.com

Here is fix for "2) Pass "output physical rowtype" to the translateProjects and short-circuit there": https://github.com/julianhyde/optiq/pull/261

It looks to solve most of the (new Object[] {net.hydromatic.optiq.runtime.SqlFunctions.toInt(row[0]),...} cases.

>If you are interested in larger queries

I am interested in "mvn test" as well as since slow tests slow down the development.

We might want introducing "perf-tests" module that runs jmh tests to properly measure prepare/execute time.

Vladimir

Vladimir Sitnikov

unread,

May 11, 2014, 3:53:04 AM5/11/14

to opti...@googlegroups.com

What do you think if using code generation for cursor accessors?

1) Current implementation of accessors for custom row types is based on reflection (RecordEnumeratorGetter).

This is an overhead that can be easily resolved if List<AbstractGetter> getAccessors is added to Bindable interface.

Then I guess we can switch to custom generated classes as a primary row storage format.

2) If we do switch to custom row format, we might generate not only AbstractGetters, but even more specific accessors to avoid boxing of primitives (Object[] forces boxing, while custom format allows non-boxed primitives).

Vladimir

Julian Hyde

unread,

May 12, 2014, 8:12:45 PM5/12/14

to opti...@googlegroups.com

I’m neutral on the idea. I can see that code generation would be more efficient. But for small, simple queries interpreted code is better than generated code (which needs to be compiled using janino and then optimized using JIT).

Also, this code potentially runs in a JDBC client in a different JVM. I don’t want to code-generation client-side, or shipping class files. It all has to work based on what’s in optiq-avatica.jar.

I wouldn't object if you made this change, as long as we retain the option to use static classes.

If you want to return data in a high-performance fashion, it’s better to skip JDBC altogether, and use a callback to something like

net.hydromatic.linq4j.expressions.Primitive.Sink or (even better) a bulk interface passing column vectors such as java.nio.IntBuffer for each column.

Julian

Vladimir Sitnikov

unread,

May 18, 2014, 8:12:13 AM5/18/14

to opti...@googlegroups.com

It is hard to avoid code generation: the very first time you try overriding a method you are done.

For instance, current strategy of new AbstractEnumerable(){ public Enumerator enumerator(){...} } assumes that lots of methods are inherited from BaseEnumerable.

Linq4j interpreter can execute .enumerator() method (it knows the body), however it is not that easy to call an inherited method.

super(...) from within a method is not easy either as invokespecial is not available through reflection API.

I guess the majority of cases can be covered with hard-coding of certain classes while still reusing current translation to linq4j.Expression. For instance, invent InterpretableAbstractEnumerable that proxies .enumerable() to interpreter and inherits all the other java methods.

However, if the code contains just a single unsupported-in-interpreter code, we'll have to fallback to code generation.

Also, this code potentially runs in a JDBC client in a different JVM. I don’t want to code-generation client-side, or shipping class files. It all has to work based on what’s in optiq-avatica.jar.

You know, Proxy.newProxyInstance generates classes. I am afraid, you cannot meet "zero class generation in runtime" goal.

If generating classes we might want allow class unloading (i.e. avoid injecting classes into the main classloader)

Vladimir

Reply all

Reply to author

Forward