Re: [ZODB] Sqlite checkpoint issue

144 views
Skip to first unread message

Jason Madden

unread,
Jun 30, 2020, 4:15:50 PM6/30/20
to Venugopal Thotakura, zodb
As luck would have it, I've been looking into this very issue over the past few days and can offer some insight.

> On Jun 30, 2020, at 02:13, Venugopal Thotakura <tvg...@gmail.com> wrote:
>
> Hi,
>
> We are using sqlite storage engine, we are facing issues with checkpointing (WAL size going into GB.. It looks like by default checkpointing is disabled, so we enabled auto checkpointing..

Auto-checkpoints should be enabled by default, unless your copy of sqlite was compiled in a strange way. RelStorage does not disable auto-checkpoints, but it will log what the default value is when a connection is opened (it's usually 1000):

DEBUG:relstorage.adapters.sqlite.drivers:Connection: <Connection at 0x113348c30 to '...' in_transaction=False>.
Using sqlite3 version: 3.32.3.
Default connection settings: {... 'wal_autocheckpoint': 1000,...}
Changing connection settings: {'synchronous': 2, 'cache_spill': 483, 'foreign_keys': 0}.
Desired connection settings: {'synchronous': 1, 'cache_spill': 0, 'foreign_keys': 1}.
Unapplied connection settings: {}.


> <pragmas>
> wal_autocheckpoint 100
> wal_checkpoint FULL
> </pragmas>

(The `wal_checkpoint` pragma is a one-time operation, not a persistent setting that applies to auto-checkpoints. The pragmas are executed when a connection is opened, so that last line will cause each new connection to run a checkpoint. The function-call syntax needed to actually pass `FULL`, e.g., `PRAGMA wal_checkpoint(FULL)`, is not supported for the pragmas executed at connection open, though, so the default value of `PASSIVE` gets used instead. That's probably lucky, because a FULL checkpoint is a blocking operation and might never successfully complete...)

>
> Now, It looks like its trying to do checkpointing, however due to the connections arleady open, it couldn't do that..

Auto-checkpoints are always PASSIVE checkpoints (and never block). PASSIVE checkpoints will mark the space in the WAL file that's available for re-use, but if there are open *transactions* viewing the database as-of some time in the past, that part of the WAL won't be available for re-use. Depending on the workload, those transactions could be using data at the end of the WAL, in which case new uses of the WAL will have to grow the file.

The problem is open transactions, not necessarily open connections.

How do you know if there's an open transaction, and what can you do about controlling them?

> So to give overview of our architecture..
>
> We connect to db from multiple processes
> - 2 read process (async read with our async connection pool to limit the connections)
> - 1 write process (async write)
>
>
> Here is our async pool.. we prefork the connections (we are seeing sometimes db.open is taking longer time to create connections, so we are doing prefork) and use them.


First and most importantly, always use the transaction manager in its explicit mode. There are a number of benefits to that, but most relevant here is that when the transaction manager is in explicit mode, ZODB alters the way it uses RelStorage, and RelStorage is able to manage the underlying database transaction in a much better way.

Second, to be able to use explicit mode (as well as one of the mitigations discussed below), I suspect you may need to drop the idea of "pre-forking" connections. Let the ZODB DB and its connection pool manage that. (If you're find it sometimes slow to open connections, you may need to adjust the size of the ZODB connection pool.) Begin a transaction, open a connection, do the work, then commit/rollback the transaction, and finally close the connection. By carefully bounding the transaction and connection lifecycle this way, together with using an explicit transaction manager, you can be sure about the lifetime of the underlying database transaction as well.

Lastly (and this is not a problem on your side, it's an issue I need to address in RelStorage) having exactly one concurrent writer actually exacerbates the situation a little bit. That's because of the way connections are internally handled in RelStorage. Basically, one connection can get in its own way (again, depending on the workload) and when a commit triggers an auto-checkpoint, it may not be able to clean up all the WAL pages. But if another connection commits shortly after that and triggers an auto-checkpoint, then it may be able to make more progress. If there's only ever one writer, though, it's likely to keep tripping over its own feet. (This can still happen with multiple writers but I think it's more rare.)

This is something I will change in RelStorage (I have a prototype fix now). But in the meantime, there are two possible mitigations:

1) Use a very low `wal_autocheckpoint` so that connections are cleaning up after each other more often. This only matters, of course, if there is more than one writer.
2) Manually invoke a checkpoint *immediately* after committing. This only works if the transaction managers are in explicit mode, and it has to be immediately after committing so that the connection is not in its own way. This works for a single writer. (And yes, this is using non-public APIs so it's likely to break in the future. Hopefully the changes I make to RelStorage alleviate the issue enough, but if not we can look at doing something more sophisticated and permanent.)


Here's an example program demonstrating this. If the transaction managers are in explicit mode, then the WAL never grows beyond two pages (8K); if they are left implicit, then by the time executing finishes, the WAL has grown to 80MB.


import os
import logging

import transaction
from ZODB.config import databaseFromString

db_config = """
%import relstorage
<zodb>
pool-size 1
<relstorage>
keep-history false
<sqlite3>
data-dir /tmp/rstest
</sqlite3>
</relstorage>
</zodb>
"""

logger = logging.getLogger(__name__)

def report_sizes():
os.system('ls -lh /tmp/rstest')

def run_transaction(db):
tx = transaction.begin()
conn = db.open()
root = conn.root()
root['key'] = 'abcd' * 1000
tx.commit()
conn.close()
checkpoint(conn)

def checkpoint(conn):
sc = conn._storage._load_connection.cursor
sc.execute('pragma main.wal_checkpoint(passive)')
sc.fetchall() # Must fetchall! Plus there's actual useful info in the results.


def main():
logging.basicConfig(level=logging.DEBUG)
logging.getLogger('txn').setLevel(logging.ERROR)
# Make sure the global (default) transaction manager is
# explicit.
transaction.manager.explicit = True
db = databaseFromString(db_config)


# Open an isolated connection. If the transaction
# manager isn't in explicit mode, it really will talk to
# the database now, which will cause WAL to start accumulating.
# Use explicit transaction managers to prevent that from happening
# and keep a tight reign on transaction duration.
isolated_txm = transaction.TransactionManager()
isolated_txm.explicit = True # If this is commented out, the WAL grows without bound
c1 = db.open(isolated_txm)

report_sizes()
for _ in range(10):
for _ in range(1000):
run_transaction(db)
report_sizes()

c1.close()
db.close()
report_sizes()


if __name__ == '__main__':
main()

~Jason

Venugopal Thotakura

unread,
Jul 1, 2020, 6:31:42 AM7/1/20
to zodb
What a coincidence, definetly we are lucky here :)

Thank you so much for your promt reply and providing much needed insights for us.


I think I didn't give more details of our architecture.. Will put some more details, so it may help others as well..


We connect to db from multiple processes
- 2 read process (async read with our async connection pool to limit the connections)
- 1 write process (async write, serialized with batch)


Our writer is async write, uses queue and writes them in batches (so all writes are serial).


We have bridged zodb sync stuff with async, we run all the zodb sync APIs in threadpool executor except connection handling (so the reason for having our async connection pool and prefork conneciton to not block the main thread). Also zodb internal connection pool (sync) doesn't bound to the limit, and pool size can go unbounded.. causing memory havoc (we can limit memory heap, etc I guess).. so we would want to control the pool. Main thread take care of ensuring the connections are shared amoung the coroutines and which inturn runs inside a thread.. and when context ends, pool get the connection back..







 Auto-checkpoints should be enabled by default, unless your copy of sqlite was compiled in a strange way. RelStorage does not disable auto-checkpoints, but it will log what the default value is when a connection is opened (it's usually 1000):

We are using standard sqlite compiled version.. Ya we have checked about the 1k page default value.. 

https://github.com/zodb/relstorage/blob/master/src/relstorage/adapters/sqlite/drivers.py#L512
But comment here and the default value got us thinking like that.. Perhaps, we should have debuggged connection default settings.. 

I have tested this, even with small values, WAL size still going up with 50 requests per second. so autocheckpoint may not help us much I guess if we have not mistaken.

 (The `wal_checkpoint` pragma is a one-time operation, not a persistent setting that applies to auto-checkpoints. The pragmas are executed when a connection is opened, so that last line will cause each new connection to run a checkpoint. The function-call syntax needed to actually pass `FULL`, e.g., `PRAGMA wal_checkpoint(FULL)`, is not supported for the pragmas executed at connection open, though, so the default value of `PASSIVE` gets used instead. That's probably lucky, because a FULL checkpoint is a blocking operation and might never successfully complete...)

Oh thanks for insight! 

Luck here again, however Writes are all serialized, we can have FULL mode also if we understand it correctly.. since we are not blocking any other writers (single wirte in our case) and not blocking reader here, and our backlog would clear up as soon its done with checkpoint.


  Auto-checkpoints are always PASSIVE checkpoints (and never block). PASSIVE checkpoints will mark the space in the WAL file that's available for re-use, but if there are open *transactions* viewing the database as-of some time in the past, that part of the WAL won't be available for re-use. Depending on the workload, those transactions could be using data at the end of the WAL, in which case new uses of the WAL will have to grow the file.
 The problem is open transactions, not necessarily open connections.
 
How do you know if there's an open transaction, and what can you do about controlling them?

 First and most importantly, always use the transaction manager in its explicit mode. There are a number of benefits to that, but most relevant here is that when the transaction manager is in explicit mode, ZODB alters the way it uses RelStorage, and RelStorage is able to manage the underlying database transaction in a much better way.

We use context manager for every single db operation, so all transactions are commited by the end of the db operation. 

You have pointed us to right direction about transaction being opened. We are running manual checkpoint also in the tranaction, I think this is one reason for the issue (but with some hack - droping load connection before transacion & droping store connection after transaction, we were able to get it working).


2) Manually invoke a checkpoint *immediately* after committing. This only works if the transaction managers are in explicit mode, and it has to be immediately after committing so that the connection is not in its own way. This works for a single writer. (And yes, this is using non-public APIs so it's likely to break in the future. Hopefully the changes I make to RelStorage alleviate the issue enough, but if not we can look at doing something more sophisticated and permanent.)
 
With manual checkpoint, explicit mode and not running the manual checkpoint in another tranction did the trick.. Its working fine now without hacking sol (droping load connection, etc). We were in wrong impression that issue was due to connections, but thanks for the hint.

I think we can leave with it for now, but looking forward for the more sophistacted solution to trigger manual checkpoint.. 


We apprecaite your prompt feedback and very thankful for that. Thank you for the sample code, its very useful!


Reply all
Reply to author
Forward
0 new messages