2.0 Migration: Transaction Scope Performance Pitfalls

Jonathan Brandmeyer

unread,

Mar 4, 2021, 3:51:50 PM3/4/21

to sqlalchemy

We are currently on v1.3, but we've noticed the big announcements about the 2.0 API and are reviewing our practices with future migration in mind.

One of our applications is INSERT-heavy and also uses SQLite. Standard performance guidance in this situation SQLite is to use the write-ahead log. We found that the baseline guidance in 2.0 to use engine.begin() as the transaction's context manager doesn't cooperate well with the WAL. The WAL is getting checkpointed and deleted at every transaction. The behavior is consistent with failing to cache the connection.

Using connection.begin() as the transaction's context manager restores full use of the WAL.

Is this a known pitfall when using 2.0-style transactions in the 1.3 release? Or are we misunderstanding the connection pool?

Thanks,

Jonathan Brandmeyer

Mike Bayer

unread,

Mar 4, 2021, 7:00:19 PM3/4/21

to noreply-spamdigest via sqlalchemy

hey there-

engine.begin() does not do anything to the SQLite connection at all as there is no begin() method in the Python DBAPI. this is why in your logging you will see a line that says "BEGIN (implicit)". nothing happened. the pysqlite driver controls the scope of the actual BEGIN on the sqlite3 library implcitly and there should be no difference at all in 2.0 vs. 1.x in how this occurs. There is also no difference between calling engine.begin() or connection.begin(), assuming "connection" here is the SQLAlchemy connection- engine.begin() simply calls connection.begin() after procuring the connection.

note that pysqlite's implicit control of transactions is often insufficient for some scenarios where fine-grained control of transactions scope is desired, which is where the recipe at https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#serializable-isolation-savepoints-transactional-ddl comes from in order to circumvent pysqlite's assumptions.

if you can please provide specifics that illustrate the difference in behavior as well as how you detect the WAL condition (as I am unfamiliar with this aspect of sqlite) that would be appreciated.

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/bd49f8f6-d930-4726-8c59-6eea6ed06825n%40googlegroups.com.

Jonathan Brandmeyer

unread,

Mar 5, 2021, 12:03:04 AM3/5/21

to sqlal...@googlegroups.com

On Thu, Mar 4, 2021 at 5:00 PM Mike Bayer <mik...@zzzcomputing.com> wrote:

hey there-

engine.begin() does not do anything to the SQLite connection at all as there is no begin() method in the Python DBAPI. this is why in your logging you will see a line that says "BEGIN (implicit)". nothing happened. the pysqlite driver controls the scope of the actual BEGIN on the sqlite3 library implcitly and there should be no difference at all in 2.0 vs. 1.x in how this occurs. There is also no difference between calling engine.begin() or connection.begin(), assuming "connection" here is the SQLAlchemy connection- engine.begin() simply calls connection.begin() after procuring the connection.

I agree that the `echo`ed trace is identical in both cases. But the throughput and filesystem effects certainly look as if something is triggering a checkpoint every time the connection is returned to the pool.

note that pysqlite's implicit control of transactions is often insufficient for some scenarios where fine-grained control of transactions scope is desired, which is where the recipe at https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#serializable-isolation-savepoints-transactional-ddl comes from in order to circumvent pysqlite's assumptions.

Thanks for the heads-up.

if you can please provide specifics that illustrate the difference in behavior as well as how you detect the WAL condition (as I am unfamiliar with this aspect of sqlite) that would be appreciated.

Here is a demo that illustrates what I'm seeing with the WAL. See the commentary inline.

Thanks,

--

Jonathan Brandmeyer

benchmark_sqlite_transactions.py

Mike Bayer

unread,

Mar 5, 2021, 10:29:51 AM3/5/21

to noreply-spamdigest via sqlalchemy

OK this is not hard to understand, in one case you are connecting fresh from the connection pool each time and in the other you are reusing the same Connection which will be much more performant. The second example that states "runtime 3.1 seconds" is perfectly acceptable code for SQLAlhcemy 2.0. Can I ask what documentation led you to believe that engine.connect() was somehow being removed ?

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CA%2BXzfkqB_qr3Lgthda3ySkog9RgdbE8eB5z2Lhx3BcLxmLjiXw%40mail.gmail.com.

Attachments:
benchmark_sqlite_transactions.py

Mike Bayer

unread,

Mar 5, 2021, 10:34:56 AM3/5/21

to noreply-spamdigest via sqlalchemy

yeah in fact that engine.begin() here is using a brand new SQLite connection each time as the default pool for SQlite files is NullPool, as it is a very fast connecting database and it is the safest option for concurrency. So in your first example, you have a brand new file connection and all of your WAL setup code is running each tme. In the second example, it runs only once.

background on pooling:

https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#threading-pooling-behavior

if you use the SingletonThreadPool instead of the default NullPool, you will again get the same performance with engine.begin(). however, there is no need to change existing code that uses engine.connect() / connection.begin().

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/985daffb-abcf-43cb-bd74-47bfba9f0d97%40www.fastmail.com.

Jonathan Brandmeyer

unread,

Mar 5, 2021, 11:00:55 AM3/5/21

to sqlal...@googlegroups.com

On Fri, Mar 5, 2021 at 8:29 AM Mike Bayer <mik...@zzzcomputing.com> wrote:

Can I ask what documentation led you to believe that engine.connect() was somehow being removed ?

The guidance at [1] makes it clear that the connection is available as a transaction context manager. However, the guidance at [2] suggests that the preferred method is to use the engine.

[1]: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-core-connection-transaction

[2]: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#many-choices-becomes-one-choice

Thanks,

--

Jonathan Brandmeyer

unread,

Mar 5, 2021, 11:45:26 AM3/5/21

to sqlal...@googlegroups.com

On Fri, Mar 5, 2021 at 8:34 AM Mike Bayer <mik...@zzzcomputing.com> wrote:

yeah in fact that engine.begin() here is using a brand new SQLite connection each time as the default pool for SQlite files is NullPool, as it is a very fast connecting database and it is the safest option for concurrency. So in your first example, you have a brand new file connection and all of your WAL setup code is running each tme. In the second example, it runs only once.

background on pooling:

https://docs.sqlalchemy.org/en/13/dialects/sqlite.html#threading-pooling-behavior

if you use the SingletonThreadPool instead of the default NullPool, you will again get the same performance with engine.begin(). however, there is no need to change existing code that uses engine.connect() / connection.begin().

Got it, thanks. I had read https://docs.sqlalchemy.org/en/13/core/pooling.html#connection-pool-configuration but managed to miss the fine print that SingletonThreadPool was used for in-memory connections and NullPool was used for file connection.

--

Jonathan Brandmeyer

Mike Bayer

unread,

Mar 5, 2021, 12:39:51 PM3/5/21

to noreply-spamdigest via sqlalchemy

OK, so the term "one choice" is clarified at the end, that there are "one and a half" choices - that is, there are still two syntaxes that are appropriate, but the "one choice" refers to the fact that you are always using an explicit Connection:

# one choice!

with engine.begin() as conn:

result = conn.execute(stmt)

# OK one and a half choices (the commit() is 1.4 / 2.0 using future engine):

with engine.connect() as conn:

result = conn.execute(stmt)

conn.commit()

you're using the second pattern. "with conn.begin()" is also still valid though is not strictly necessary in 2.0.

I will modify the verbiage now.

Thanks,
--
Jonathan Brandmeyer

--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CA%2BXzfkpqV2PDBm9EQYBZT6jiO_ZKHR1uKn%2BwB90t2D23An10ug%40mail.gmail.com.

Reply all

Reply to author

Forward