Custom Dialect - recommendations needed for handling of Sequences/lastrowid

jank

unread,

Jan 25, 2013, 2:25:13 AM1/25/13

to sqlal...@googlegroups.com

Hello,

I have implemented a dialect for a new database (EXASol). I have not done tests using the ORM layer of SA so far as I am primarily interested in the Core layer.

So far things worked out pretty well DDL and DML support are basically running.

The EXASol DB does not offer Sequences but autoincrement columns that are very similar to postgres SERIAL types.

Example DDL statement:

CREATE TABLE test_exadialect.test (

id INTEGER IDENTITY 10 NOT NULL,

name VARCHAR(40) NOT NULL,

age INTEGER,

PRIMARY KEY (id)

)

Identity is the keyword to add autoincrement behavior to an Integer-like column. 10 is the initial value of the autoincrement.

This DDL statement is generated based on this table metadata:

Table('test', self.metadata,

Column('id', Integer, Sequence('test.id.seq', start=10, optional=True), primary_key=True),

Column('name', String(40), nullable=False),

Column('age', Integer)

)

Looking at the postgres dialect implementation, I came to the conclusion that using Sequences is the only way to get the desired behavior. I have also implemented the get_lastrowid() method of the ExecutionContext class. This all works as expected albeit at the costs of an additional roundtrip for each single insert as the DB in question does not support RETURNING.

First question: is this the intended way to implement autoincrement behavior in the absence of support for explicit sequence objects in the DB?

No to the problem that I could not solve so far. I want to make the costs of fetching the last autoincrement id upon insert/update optional. In our use case we are fine with the DB determining the next id value without knowing about the value upon insert. I tried to fiddle around with various configuration switches. Namely:

postfetch_lastrowid
autoincrement

My first attempt was to set the postfetch_lastrowid switch to False. However, this switch seems to have wider implications than just switching off postfetching of the lastrowid. With the swtich to False the SQLCompiler generates different INSERT statement:

for:

test_tab.insert().values(name='foo', age=12).execute()

I do get...

with postfetch_lastrowid=True:

INSERT INTO test_exadialect.test (name, age) VALUES ('foo', 12)

with postfetch_lastrowid=False:

INSERT INTO test_exadialect.test (id, name, age) VALUES (NULL, 'foo', 12)

with this statement obviously being rejected by the DB as NULL is not allowed (and not desired) for the primary key column.

So far my understanding of SA is limited, but I assume that setting postfetch_rowid to False is interpreted by SA as "this DB does not support sequences/autoincrement".

I tried setting for the id column autoincrement=False would prevent the SQLCompiler from forcing it into the INSERT statement:

Column('id', Integer, Sequence('test.id.seq', start=10, optional=True), primary_key=True, autoincrement=False),

Running and debugging my test case, the column object had the value True for the autoincrement property. I assume that the combination of Sequence and primary_key somehow overrides the value to True but I am lost in the SA code base.

Second question: Can someone give me a hint or pointer on where to look? Am I doing something wrong or trying to misuse the autoincrement flag?

All I want to achieve is to make the fetching of the lastrowid optional. Do I have to implement my own dialect-specific flag? If so, what is the recommended way of doing this?

Thanks for your time and any hint/advice,

Jan

Michael Bayer

unread,

Jan 25, 2013, 10:22:23 AM1/25/13

to sqlal...@googlegroups.com

On Jan 25, 2013, at 2:25 AM, jank wrote:

Hello,
I have implemented a dialect for a new database (EXASol).

that's great. I'd like to point you to a new system we have for testing and deploying external dialects, where your dialect can be packaged with a standard layout and make use of a series of "compliance" suites within SQLAlchemy. Within this test system, you can customize fully those capabilities which your dialect supports.

If you check out SQLAlchemy-Access and SQLAlchemy-Akiban, you can see the standard forms:

https://bitbucket.org/zzzeek/sqlalchemy-access

https://github.com/zzzeek/sqlalchemy_akiban

the key files within these packages regarding using the SQLAlchemy compliance suite are:

/run_tests.py - test runner, is a front-end to Nose

/setup.cfg - test runner configuration

/test/requirements.py - a custom SuiteRequirements class which provides rules for those features and behaviors supported by the database

/test/test_suite.py - pulls in the sqlalchemy.testing.suite package which causes the "suite" tests to be present for the Nose runner.

the "compliance suite" is a work in progress and doesn't cover everything yet. Key areas that it does cover are the whole INSERT/lastrowid mechanics you're concerned with here, database reflection, and basic SQL types.

I have not done tests using the ORM layer of SA so far as I am primarily interested in the Core layer.
So far things worked out pretty well DDL and DML support are basically running.
The EXASol DB does not offer Sequences but autoincrement columns that are very similar to postgres SERIAL types.

Example DDL statement:

CREATE TABLE test_exadialect.test (
id INTEGER IDENTITY 10 NOT NULL,
name VARCHAR(40) NOT NULL,
age INTEGER,
PRIMARY KEY (id)
)

Identity is the keyword to add autoincrement behavior to an Integer-like column. 10 is the initial value of the autoincrement.

This DDL statement is generated based on this table metadata:

Table('test', self.metadata,
Column('id', Integer, Sequence('test.id.seq', start=10, optional=True), primary_key=True),
Column('name', String(40), nullable=False),
Column('age', Integer)
)

Looking at the postgres dialect implementation, I came to the conclusion that using Sequences is the only way to get the desired behavior.

The best dialect for you to look at here would be the MSSQL dialect, lib/sqlalchemy/dialects/mssql/base.py and perhaps the pyodbc implementation of it, lib/sqlalchemy/dialects/mssql/pyodbc.py. MSSQL's INSERT system resembles this the most, where we use the Sequence to allow configurability of the IDENTITY column, and a "post-fetch" at the cursor level is used to get at the last inserted identity. The post-fetch is performed right on the same cursor that the INSERT occurred on and bypasses the usual SQLAlchemy mechanics of executing a statement, so to that degree the Python overhead of this "post fetch" is negligible.

I have also implemented the get_lastrowid() method of the ExecutionContext class. This all works as expected albeit at the costs of an additional roundtrip for each single insert as the DB in question does not support RETURNING.

First question: is this the intended way to implement autoincrement behavior in the absence of support for explicit sequence objects in the DB?

sounds like you're on the right track, the usage of Sequence is optional overall but if you want configurability of the "start" and all that then yes.

No to the problem that I could not solve so far. I want to make the costs of fetching the last autoincrement id upon insert/update optional.

the last row id mechanics only come into play when an Insert() construct is used. This construct supports a flag "inline=True" which is intended to indicate an INSERT where you don't need any of the "default" values back. If you execute a table.insert(inline=True)... the entire "lastrowid" mechanics are bypassed, you can see this in sqlalchemy/engine/default.py line 663 post_insert(). This functionality of this flag is invoked automatically whenever the Insert() construct is used in an "executemany" context as well.

In our use case we are fine with the DB determining the next id value without knowing about the value upon insert. I tried to fiddle around with various configuration switches. Namely:
postfetch_lastrowid

postfetch_lastrowid refers to whether or not the method of acquiring the last inserted id, when it is desired, is done via post-fetch, or whether the last inserted id is provided by some other method, which could be one of: pre-execute+embed in the INSERT, embed in the INSERT+use RETURNING, use the dbapi lastrowid() method. When this flag is False, in the absense of lastrowid() or RETURNING the system behaves as though a "pre-execute" insert is present, but since that's not implemented either you get a NULL.

The flag does not indicate that the dialect flat out doesn't support returning an inserted PK value - the ability to return the last inserted PK is a requirement for a SQLAlchemy dialect as this is one of the most fundamental features the Core provides.

autoincrement

I tried setting for the id column autoincrement=False would prevent the SQLCompiler from forcing it into the INSERT statement:

Column('id', Integer, Sequence('test.id.seq', start=10, optional=True), primary_key=True, autoincrement=False),

Running and debugging my test case, the column object had the value True for the autoincrement property. I assume that the combination of Sequence and primary_key somehow overrides the value to True but I am lost in the SA code base.

autoincrement this is a Column level flag that indicates whether or not a column should be treated as autoincrement during the DDL process, assuming the column is integer based and doesn't have a foreign key constraint. It also has some significance during the "postfetch" process, but doesn't at the moment indicate that the postfetch process should be skipped unconditionally - however, you can certainly, if you wanted, check this flag within your own postfetch() routine and then not perform the lastrowid action.

As for the autoincrement flag being flipped to True, I can't reproduce your behavior:

from sqlalchemy import Column, Integer, Table, MetaData, Sequence

t = Table('t', MetaData(),

Column('x', Integer, Sequence('y'), primary_key=True, autoincrement=False))

assert t.c.x.autoincrement is False

assert t._autoincrement_column is None

if you can send me a code example illustrating the autoincrement flag being silently flipped to True that would be helpful.

jank

unread,

Jan 26, 2013, 5:37:09 AM1/26/13

to sqlal...@googlegroups.com

Hi Michael,

thank you for all that input. I will give the test suite a try and also look into all your suggestions. I'll try to provide a minimal test case for the auto_increment behavior. This will take some time...

Jan

jank

unread,

Jan 29, 2013, 5:36:30 PM1/29/13

to sqlal...@googlegroups.com

I did a lot of progress. Fine tuning the dialect and the dialect specific requirements.py helped a lot.

I am still not ready to provide a minimal test case for the auto_increment behavior. At the current stage I do not trust my dialect implementation.

While completing this a question again regarding the use of the identifier 'data' in the test suite. This identifier is used in several tests. How and where did you changed it to correct quoting? I did a checkout via hg clone http://hg.sqlalchemy.org/sqlalchemy and did not find any quoting in the test suite. For now I have locally changed 'data' to 't_data' to get rid of a bunch of failing test cases.

Another question. The database I am implementing does not support indexes. Do you have a good idea what an implementation of def visit_create_index(self, create) should look like? Will I have to add a new requirement to the requirements.py to indicate non-existence of indexes?

Michael Bayer

unread,

Jan 29, 2013, 6:24:57 PM1/29/13

to sqlal...@googlegroups.com

On Jan 29, 2013, at 5:36 PM, jank wrote:

I did a lot of progress. Fine tuning the dialect and the dialect specific requirements.py helped a lot.

I am still not ready to provide a minimal test case for the auto_increment behavior. At the current stage I do not trust my dialect implementation.

While completing this a question again regarding the use of the identifier 'data' in the test suite. This identifier is used in several tests. How and where did you changed it to correct quoting? I did a checkout via hg clone http://hg.sqlalchemy.org/sqlalchemy and did not find any quoting in the test suite. For now I have locally changed 'data' to 't_data' to get rid of a bunch of failing test cases.

this changeset removes the use of the column name "data" in a raw SQL statement:

http://hg.sqlalchemy.org/sqlalchemy/rev/fecb390572d8

all the rest of the places a column named "data" is used, it's specified as a Column object which should quote properly when rendered.

Another question. The database I am implementing does not support indexes. Do you have a good idea what an implementation of def visit_create_index(self, create) should look like? Will I have to add a new requirement to the requirements.py to indicate non-existence of indexes?

I think for the moment have it raise NotImplementedError(), and then yes we'd need to add "indexes" to the base requirements.py as something that might not be supported.

jank

unread,

Sep 2, 2013, 11:21:07 AM9/2/13

to sqlal...@googlegroups.com

Am Mittwoch, 30. Januar 2013 00:24:57 UTC+1 schrieb Michael Bayer:

On Jan 29, 2013, at 5:36 PM, jank wrote:

I did a lot of progress. Fine tuning the dialect and the dialect specific requirements.py helped a lot.

I am still not ready to provide a minimal test case for the auto_increment behavior. At the current stage I do not trust my dialect implementation.

While completing this a question again regarding the use of the identifier 'data' in the test suite. This identifier is used in several tests. How and where did you changed it to correct quoting? I did a checkout via hg clone http://hg.sqlalchemy.org/sqlalchemy and did not find any quoting in the test suite. For now I have locally changed 'data' to 't_data' to get rid of a bunch of failing test cases.

this changeset removes the use of the column name "data" in a raw SQL statement:

http://hg.sqlalchemy.org/sqlalchemy/rev/fecb390572d8

all the rest of the places a column named "data" is used, it's specified as a Column object which should quote properly when rendered.

Hi Michael,

I looked at this problem again. The use of reserved words for identifiers is ok as long as they are correctly quoted. I had a an error in my reserved_keywords array (upper- lowercase confusion). That is now all good and find.

Jan

Reply all

Reply to author

Forward