Serializable txns not useful on sqlite because do_begin() does nothing

74 views
Skip to first unread message

Randall Nortman

unread,
Sep 18, 2010, 6:52:18 PM9/18/10
to sqlalchemy
In testing my code for concurrency, I discovered that transactions are
not properly isolated on sqlite, even with
isolation_level='SERIALIZABLE'. It turns out that on the sqlite
dialect, do_begin() does nothing. As a result, transactions are not
isolated as expected. Dirty writes are easy to construct; using the
ORM:

(assuming a mapped class Foo with attributes id and bar, and a Foo
with id=1 existing in the database)

Process A: foo = Session.query(Foo).filter_by(id=1)
Process B: foo = Session.query(Foo).filter_by(id=1)
Process A: foo.bar = 'happy happy'
Process B: foo.bar = 'joy joy'
Process A: Session.commit()
Process B: Session.commit()

What ought to happen, with serializable transactions, is that one
process or another should get a locking error. (This is what happens
on postgresql, with isolation_level='SERIALIZABLE'.) What happens
instead is that B overwrites A's update, silently. This is because
the initial SELECT does not begin a transaction in sqlite (because the
pysqlite driver waits for an INSERT/UPDATE/DELETE to begin a
transaction). All that is required is that a BEGIN statement be
issued before the query is executed.

Note that adding with_lockmode('read') to the query seems to work
(though this doesn't seem to be documented anywhere -- the docs
mention the with_lockmode method, but I had to go into the code to see
what it needed to be set to). However, the idea behind serializable
transaction isolation is that you shouldn't need to explicitly lock
anything. Furthermore, with echo=True on the engine, the engine logs
that BEGIN *is* being executed -- except it's not, because
SQLiteDialect does not implement do_begin(). Why not? For
efficiency? How about checking to see if isolation_level was set to
SERIALIZABLE, and at least in that case issue a BEGIN when do_begin()
is called?

And also, in the interest of transparency, perhaps let's not echo that
BEGIN is being executed unless it actually is, eh? That would have
saved me many hours of headache.

Thanks and sorry if I sounded a bit snarky with any of that, but it's
been a long day tracking this down.

Randall

Randall Nortman

unread,
Sep 18, 2010, 7:20:51 PM9/18/10
to sqlalchemy
On Sep 18, 6:52 pm, Randall Nortman <wondercl...@gmail.com> wrote:
[...]
> isolated as expected.  Dirty writes are easy to construct; using the
[...]

Sorry, I meant "lost update" rather than "dirty write". In my head I
think of dirty read and dirty write, but I should translate to
standard terminology when the concept leaves my head.

Michael Bayer

unread,
Sep 18, 2010, 11:59:42 PM9/18/10
to sqlal...@googlegroups.com

On Sep 18, 2010, at 6:52 PM, Randall Nortman wrote:

> In testing my code for concurrency, I discovered that transactions are
> not properly isolated on sqlite, even with
> isolation_level='SERIALIZABLE'. It turns out that on the sqlite
> dialect, do_begin() does nothing. As a result, transactions are not
> isolated as expected.

DBAPI works on an always-transactionalized model. There is no begin() method with DBAPI, only commit() and rollback(). By default, a DBAPI connection is always in a transaction, so it is not appropriate to issue any kind of "BEGIN" - the DBAPI does this transparently. See http://www.python.org/dev/peps/pep-0249/ .

That said, SQLite has connection isolation options which modify how it interprets the DBAPI connection, which affect whether or not there is actually a transaction in progress. The "isolation_level" pysqlite2 argument, which is not the same thing as SQLAlchemy's "isolation_level" parameter to create_engine(), affects this behavior - you can set it via create_engine using connect_args={'isolation_level':<level>}.

Further info for the sqlite3's transactional behavior is at: http://docs.python.org/library/sqlite3.html#controlling-transactions .


Michael Bayer

unread,
Sep 19, 2010, 12:11:36 AM9/19/10
to sqlal...@googlegroups.com

On Sep 18, 2010, at 6:52 PM, Randall Nortman wrote:

> In testing my code for concurrency, I discovered that transactions are
> not properly isolated on sqlite, even with
> isolation_level='SERIALIZABLE'. It turns out that on the sqlite
> dialect, do_begin() does nothing. As a result, transactions are not
> isolated as expected. Dirty writes are easy to construct; using the
> ORM:
>
> (assuming a mapped class Foo with attributes id and bar, and a Foo
> with id=1 existing in the database)
>
> Process A: foo = Session.query(Foo).filter_by(id=1)
> Process B: foo = Session.query(Foo).filter_by(id=1)
> Process A: foo.bar = 'happy happy'
> Process B: foo.bar = 'joy joy'
> Process A: Session.commit()
> Process B: Session.commit()

Oh and also, when you above say "Process A", "Process B", I assume those are separate processes. The default connection pool used with SQLite is the SingletonThreadPool - the same connection is used globally in a thread, so if your test above occurs within a single thread, you should select a different pool implementation. See:

http://www.sqlalchemy.org/trac/wiki/FAQ#IamusingmultipleconnectionswithaSQLitedatabasetypicallytotesttransactionoperationandmytestprogramisnotworking

Further rationale on this is available at : http://www.sqlalchemy.org/docs/dialects/sqlite.html?highlight=sqlite#threading-behavior


Randall Nortman

unread,
Sep 19, 2010, 9:54:37 AM9/19/10
to sqlalchemy
On Sep 18, 11:59 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Sep 18, 2010, at 6:52 PM, Randall Nortman wrote:
>
> > In testing my code for concurrency, I discovered that transactions are
> > not properly isolated on sqlite, even with
> > isolation_level='SERIALIZABLE'.  It turns out that on the sqlite
> > dialect, do_begin() does nothing.  As a result, transactions are not
> > isolated as expected.
>
> DBAPI works on an always-transactionalized model.  There is no begin() method with DBAPI, only commit() and rollback().  By default, a DBAPI connection is always in a transaction, so it is not appropriate to issue any kind of "BEGIN" - the DBAPI does this transparently.   Seehttp://www.python.org/dev/peps/pep-0249/.
>
> That said, SQLite has connection isolation options which modify how it interprets the DBAPI connection, which affect whether or not there is actually a transaction in progress.   The "isolation_level" pysqlite2 argument, which is not the same thing as SQLAlchemy's "isolation_level" parameter to create_engine(), affects this behavior - you can set it via create_engine using connect_args={'isolation_level':<level>}.
>

Unfortunately, the isolation_level parameter to pysqlite2 only
controls the type of BEGIN issued, not when it is issued (I just
tested to be sure). It still waits until there is an INSERT/UPDATE/
DELETE to begin the transaction. I expect the semantics of
serializable transactions to mean the transaction begins also with a
SELECT. The vast majority of database operations, I'd wager, begin
with a SELECT.

Call this a bug in pysqlite2 if you like, though the DBAPI spec seems
to be silent on when transactions are started, so I think the
developers will defend their choice. (I don't see where the spec says
that a DBAPI connection is always in a transaction.) SA can provide
the behavior I want simply by implementing do_begin in SQLiteDialect
as a one-liner: connection.execute("BEGIN"). Some people may not want
that behavior, I recognize, so perhaps a check to see if serializable
transaction isolation was requested in the create_engine call, and if
so, then issue the BEGIN?

At the very least, I beg you, don't have the echo code output BEGIN if
no BEGIN is being issued. That had me barking up the wrong tree for
many hours yesterday. The echo feature is great for troubleshooting,
but only if it doesn't lie!

To answer your question in your second email, yes Process A and
Process B are two different processes in my setup, with nothing shared
other than the database itself.

Michael Bayer

unread,
Sep 19, 2010, 1:52:11 PM9/19/10
to sqlal...@googlegroups.com

On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:

>
> Unfortunately, the isolation_level parameter to pysqlite2 only
> controls the type of BEGIN issued, not when it is issued (I just
> tested to be sure). It still waits until there is an INSERT/UPDATE/
> DELETE to begin the transaction. I expect the semantics of
> serializable transactions to mean the transaction begins also with a
> SELECT. The vast majority of database operations, I'd wager, begin
> with a SELECT.

certainly. The pysqlite connection should be in a transaction immediately. this is what all the other DBAPIs do.

>
> Call this a bug in pysqlite2 if you like,

I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.

> though the DBAPI spec seems
> to be silent on when transactions are started, so I think the
> developers will defend their choice.

I think you should give it a try, the Pysqlite folks are very on top of things and really want to do things correctly, and this is a case of them doing plainly the wrong thing for serializable isolation. SQLA really, really doesn't want to get into correcting DBAPI behavior. I've attached a test case for you to post on their tracker at http://code.google.com/p/pysqlite/issues/list .

test.py

Randall Nortman

unread,
Sep 20, 2010, 3:51:26 PM9/20/10
to sqlalchemy
On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
[...]
> I think this is a bug in Pysqlite2.    I can reproduce it with sqlite3 directly.  
[...]
> > (I don't see where the spec says
> > that a DBAPI connection is always in a transaction.)  
>
> Well, that you can't/shouldn't call "BEGIN" is implied by this paragraph, though weakly, but isn't specific about isolation.    I also bet if you emailed the DB-SIG list, they'd be interested in this detail.

The DB-SIG list agrees with you:

http://mail.python.org/pipermail/db-sig/2010-September/005645.html

And I have filed a bug on pysqlite:

http://code.google.com/p/pysqlite/issues/detail?id=21

I fully expect the pysqlite folks to say "but if we fix this it will
break existing apps which assume they can SELECT all day without
creating transactions", particularly since pysqlite is part of the
python standard library. The current behavior (no transactions begun
for SELECT) is documented, so many apps may depend on it. But I agree
with you now that it's not SQLA's problem.

[...]
> Well we like that people are familiar with DBAPI and that it has no begin(), so with that knowledge it should be clear that SQLA's log message of "BEGIN" only represents when SQLA first starts using a DBAPI connection that has no existing transactional state, and that the DBAPI is expected to have done what it needs to ensure that a BEGIN has occurred.     The "BEGIN" message can say something like "BEGIN (implicit)" or "BEGIN (virtual)", "BEGIN (assumed that the DBAPI is doing this!)"  but you can be sure most people aren't going to know what the heck that means and you'll just have more people complaining.
>

I dunno... perhaps just don't echo BEGIN at all, since it is implicit
and anybody who knows how DB-API is supposed to work will know that a
transaction is beginning? Perhaps instead of echoing the SQL that's
assumed to be executed, echo the DB-API calls that are being made. So
messages would start to look like 'execute("SELECT ...")', 'commit()',
'rollback()', etc. If there were a way to get the driver to echo what
it's actually sending to the DB, that would be great, but that's not
part of the DB-API spec.

Michael Bayer

unread,
Sep 20, 2010, 4:22:05 PM9/20/10
to sqlal...@googlegroups.com

On Sep 20, 2010, at 3:51 PM, Randall Nortman wrote:

> On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
> [...]
>> I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.
> [...]
>>> (I don't see where the spec says
>>> that a DBAPI connection is always in a transaction.)
>>
>> Well, that you can't/shouldn't call "BEGIN" is implied by this paragraph, though weakly, but isn't specific about isolation. I also bet if you emailed the DB-SIG list, they'd be interested in this detail.
>
> The DB-SIG list agrees with you:
>
> http://mail.python.org/pipermail/db-sig/2010-September/005645.html
>
> And I have filed a bug on pysqlite:
>
> http://code.google.com/p/pysqlite/issues/detail?id=21
>
> I fully expect the pysqlite folks to say "but if we fix this it will
> break existing apps which assume they can SELECT all day without
> creating transactions", particularly since pysqlite is part of the
> python standard library. The current behavior (no transactions begun
> for SELECT) is documented, so many apps may depend on it. But I agree
> with you now that it's not SQLA's problem.


They're going to say that SQLite locks the whole database file aggressively, which is why they dont start the trans until DML. That is fine, but there should be some option to control it. Where is the current behavior documented ? I couldn't find a link for that.

Marc Lemburg seemed to be non-specific in his comments about when the transaction begins for isolation purposes - he just reiterated that its not transaction-per-statement.

>
> [...]
>> Well we like that people are familiar with DBAPI and that it has no begin(), so with that knowledge it should be clear that SQLA's log message of "BEGIN" only represents when SQLA first starts using a DBAPI connection that has no existing transactional state, and that the DBAPI is expected to have done what it needs to ensure that a BEGIN has occurred. The "BEGIN" message can say something like "BEGIN (implicit)" or "BEGIN (virtual)", "BEGIN (assumed that the DBAPI is doing this!)" but you can be sure most people aren't going to know what the heck that means and you'll just have more people complaining.
>>
>
> I dunno... perhaps just don't echo BEGIN at all, since it is implicit
> and anybody who knows how DB-API is supposed to work will know that a
> transaction is beginning? Perhaps instead of echoing the SQL that's
> assumed to be executed, echo the DB-API calls that are being made. So
> messages would start to look like 'execute("SELECT ...")', 'commit()',
> 'rollback()', etc. If there were a way to get the driver to echo what
> it's actually sending to the DB, that would be great, but that's not
> part of the DB-API spec.

I really prefer it say something so I know when the connection is starting to be worked with, and I'd prefer its not just "well turn on connection pool logging". I've always thought BEGIN was clearest but I'm not going to fight hard on that, so just suggest something it could say.


Randall Nortman

unread,
Sep 20, 2010, 5:27:01 PM9/20/10
to sqlalchemy
On Sep 20, 4:22 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Sep 20, 2010, at 3:51 PM, Randall Nortman wrote:
>
>
>
> > On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> >> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
> > The DB-SIG list agrees with you:
>
> >http://mail.python.org/pipermail/db-sig/2010-September/005645.html
>
> > And I have filed a bug on pysqlite:
>
> >http://code.google.com/p/pysqlite/issues/detail?id=21
>
> > I fully expect the pysqlite folks to say "but if we fix this it will
> > break existing apps which assume they can SELECT all day without
> > creating transactions", particularly since pysqlite is part of the
> > python standard library.  The current behavior (no transactions begun
> > for SELECT) is documented, so many apps may depend on it.  But I agree
> > with you now that it's not SQLA's problem.
>
> They're going to say that SQLite locks the whole database file aggressively, which is why they dont start the trans until DML.   That is fine, but there should be some option to control it.   Where is the current behavior documented ?   I couldn't find a link for that.
>

http://docs.python.org/library/sqlite3.html#controlling-transactions

"By default, the sqlite3 module opens transactions implicitly before a
Data Modification Language (DML) statement (i.e. INSERT/UPDATE/DELETE/
REPLACE), and commits transactions implicitly before a non-DML, non-
query statement (i. e. anything other than SELECT or the
aforementioned)."

This doesn't explicitly say "doesn't open transactions before a SELECT
statement", but that's clearly implied.

> Marc Lemburg seemed to be non-specific in his comments about when the transaction begins for isolation purposes - he just reiterated that its not transaction-per-statement.
>

He said: "Transactions start implicitly after you connect and after
you call .commit() or .rollback()". In a follow-up on the thread, he
was questioned about that and responded that logically, the
transaction opens immediately after you connect, though waiting until
the first statement is executed is an optimization. He's right,
strictly speaking, because the logical model of serializable
transaction isolation doesn't care whether the transaction is opened
at the beginning of the connection or before the first query, because
the first query determines when you first were exposed to the state of
the database. The time between opening the connection and the first
query is irrelevant because the client is not exposed to the database
state during that period. To anybody not writing proofs of
algorithmic correctness, though, the transaction should start
immediately before the first statement executes, whether that
statement modifies the database or merely queries it.


> I really prefer it say something so I know when the connection is starting to be worked with, and I'd prefer its not just "well turn on connection pool logging".    I've always thought BEGIN was clearest but I'm not going to fight hard on that, so just suggest something it could say.

"BEGIN (implicit)" would be enough to clue me in that something may
not be as it seems, and shouldn't confuse too many people (for more
than a few moments). Or, leave it up to each dialect to echo
something appropriate. DefaultDialect.do_begin() could echo "BEGIN
(implicit)", and then any dialect which overrides do_begin() should
echo something appropriate.

Peter Hansen

unread,
Sep 21, 2010, 10:23:15 AM9/21/10
to sqlal...@googlegroups.com, Randall Nortman
On 2010-09-20 5:27 PM, Randall Nortman wrote:
>>I've always thought BEGIN was clearest but I'm
not going to fight hard on that, so just suggest something it could say.

> "BEGIN (implicit)" would be enough to clue me in that something may
> not be as it seems, and shouldn't confuse too many people (for more
> than a few moments).

+1 to that, as it would have saved me more than a few hours of
troubleshooting a few weeks ago.

--
Peter Hansen

Michael Bayer

unread,
Sep 21, 2010, 11:13:09 AM9/21/10
to sqlal...@googlegroups.com, Randall Nortman
you got it, re97f096e41aa

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Michael Bayer

unread,
Sep 23, 2010, 12:07:06 AM9/23/10
to sqlal...@googlegroups.com

On Sep 20, 2010, at 3:51 PM, Randall Nortman wrote:

> On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
> [...]
>> I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.
> [...]
>>> (I don't see where the spec says
>>> that a DBAPI connection is always in a transaction.)
>>
>> Well, that you can't/shouldn't call "BEGIN" is implied by this paragraph, though weakly, but isn't specific about isolation. I also bet if you emailed the DB-SIG list, they'd be interested in this detail.
>
> The DB-SIG list agrees with you:
>
> http://mail.python.org/pipermail/db-sig/2010-September/005645.html
>
> And I have filed a bug on pysqlite:
>
> http://code.google.com/p/pysqlite/issues/detail?id=21

it seems a little dead on the pysqlite tracker, so I've reposted to bugs.python.org where things generally get very quick attention:

http://bugs.python.org/issue9924

Reply all
Reply to author
Forward
0 new messages