> In testing my code for concurrency, I discovered that transactions are
> not properly isolated on sqlite, even with
> isolation_level='SERIALIZABLE'. It turns out that on the sqlite
> dialect, do_begin() does nothing. As a result, transactions are not
> isolated as expected.
DBAPI works on an always-transactionalized model. There is no begin() method with DBAPI, only commit() and rollback(). By default, a DBAPI connection is always in a transaction, so it is not appropriate to issue any kind of "BEGIN" - the DBAPI does this transparently. See http://www.python.org/dev/peps/pep-0249/ .
That said, SQLite has connection isolation options which modify how it interprets the DBAPI connection, which affect whether or not there is actually a transaction in progress. The "isolation_level" pysqlite2 argument, which is not the same thing as SQLAlchemy's "isolation_level" parameter to create_engine(), affects this behavior - you can set it via create_engine using connect_args={'isolation_level':<level>}.
Further info for the sqlite3's transactional behavior is at: http://docs.python.org/library/sqlite3.html#controlling-transactions .
> In testing my code for concurrency, I discovered that transactions are
> not properly isolated on sqlite, even with
> isolation_level='SERIALIZABLE'. It turns out that on the sqlite
> dialect, do_begin() does nothing. As a result, transactions are not
> isolated as expected. Dirty writes are easy to construct; using the
> ORM:
>
> (assuming a mapped class Foo with attributes id and bar, and a Foo
> with id=1 existing in the database)
>
> Process A: foo = Session.query(Foo).filter_by(id=1)
> Process B: foo = Session.query(Foo).filter_by(id=1)
> Process A: foo.bar = 'happy happy'
> Process B: foo.bar = 'joy joy'
> Process A: Session.commit()
> Process B: Session.commit()
Oh and also, when you above say "Process A", "Process B", I assume those are separate processes. The default connection pool used with SQLite is the SingletonThreadPool - the same connection is used globally in a thread, so if your test above occurs within a single thread, you should select a different pool implementation. See:
Further rationale on this is available at : http://www.sqlalchemy.org/docs/dialects/sqlite.html?highlight=sqlite#threading-behavior
>
> Unfortunately, the isolation_level parameter to pysqlite2 only
> controls the type of BEGIN issued, not when it is issued (I just
> tested to be sure). It still waits until there is an INSERT/UPDATE/
> DELETE to begin the transaction. I expect the semantics of
> serializable transactions to mean the transaction begins also with a
> SELECT. The vast majority of database operations, I'd wager, begin
> with a SELECT.
certainly. The pysqlite connection should be in a transaction immediately. this is what all the other DBAPIs do.
>
> Call this a bug in pysqlite2 if you like,
I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.
> though the DBAPI spec seems
> to be silent on when transactions are started, so I think the
> developers will defend their choice.
I think you should give it a try, the Pysqlite folks are very on top of things and really want to do things correctly, and this is a case of them doing plainly the wrong thing for serializable isolation. SQLA really, really doesn't want to get into correcting DBAPI behavior. I've attached a test case for you to post on their tracker at http://code.google.com/p/pysqlite/issues/list .
> On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
> [...]
>> I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.
> [...]
>>> (I don't see where the spec says
>>> that a DBAPI connection is always in a transaction.)
>>
>> Well, that you can't/shouldn't call "BEGIN" is implied by this paragraph, though weakly, but isn't specific about isolation. I also bet if you emailed the DB-SIG list, they'd be interested in this detail.
>
> The DB-SIG list agrees with you:
>
> http://mail.python.org/pipermail/db-sig/2010-September/005645.html
>
> And I have filed a bug on pysqlite:
>
> http://code.google.com/p/pysqlite/issues/detail?id=21
>
> I fully expect the pysqlite folks to say "but if we fix this it will
> break existing apps which assume they can SELECT all day without
> creating transactions", particularly since pysqlite is part of the
> python standard library. The current behavior (no transactions begun
> for SELECT) is documented, so many apps may depend on it. But I agree
> with you now that it's not SQLA's problem.
They're going to say that SQLite locks the whole database file aggressively, which is why they dont start the trans until DML. That is fine, but there should be some option to control it. Where is the current behavior documented ? I couldn't find a link for that.
Marc Lemburg seemed to be non-specific in his comments about when the transaction begins for isolation purposes - he just reiterated that its not transaction-per-statement.
>
> [...]
>> Well we like that people are familiar with DBAPI and that it has no begin(), so with that knowledge it should be clear that SQLA's log message of "BEGIN" only represents when SQLA first starts using a DBAPI connection that has no existing transactional state, and that the DBAPI is expected to have done what it needs to ensure that a BEGIN has occurred. The "BEGIN" message can say something like "BEGIN (implicit)" or "BEGIN (virtual)", "BEGIN (assumed that the DBAPI is doing this!)" but you can be sure most people aren't going to know what the heck that means and you'll just have more people complaining.
>>
>
> I dunno... perhaps just don't echo BEGIN at all, since it is implicit
> and anybody who knows how DB-API is supposed to work will know that a
> transaction is beginning? Perhaps instead of echoing the SQL that's
> assumed to be executed, echo the DB-API calls that are being made. So
> messages would start to look like 'execute("SELECT ...")', 'commit()',
> 'rollback()', etc. If there were a way to get the driver to echo what
> it's actually sending to the DB, that would be great, but that's not
> part of the DB-API spec.
I really prefer it say something so I know when the connection is starting to be worked with, and I'd prefer its not just "well turn on connection pool logging". I've always thought BEGIN was clearest but I'm not going to fight hard on that, so just suggest something it could say.
> "BEGIN (implicit)" would be enough to clue me in that something may
> not be as it seems, and shouldn't confuse too many people (for more
> than a few moments).
+1 to that, as it would have saved me more than a few hours of
troubleshooting a few weeks ago.
--
Peter Hansen
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>
> On Sep 19, 1:52 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
>> On Sep 19, 2010, at 9:54 AM, Randall Nortman wrote:
> [...]
>> I think this is a bug in Pysqlite2. I can reproduce it with sqlite3 directly.
> [...]
>>> (I don't see where the spec says
>>> that a DBAPI connection is always in a transaction.)
>>
>> Well, that you can't/shouldn't call "BEGIN" is implied by this paragraph, though weakly, but isn't specific about isolation. I also bet if you emailed the DB-SIG list, they'd be interested in this detail.
>
> The DB-SIG list agrees with you:
>
> http://mail.python.org/pipermail/db-sig/2010-September/005645.html
>
> And I have filed a bug on pysqlite:
>
> http://code.google.com/p/pysqlite/issues/detail?id=21
it seems a little dead on the pysqlite tracker, so I've reposted to bugs.python.org where things generally get very quick attention: