database locking web2py vs. "external" access...

989 views
Skip to first unread message

Doug Philips

unread,
Jun 16, 2012, 8:05:15 AM6/16/12
to web2py
I'm looking to potentially use background task(s) to offload sending
of emails and updating of other offline resources, and I'm trying to
sort out what the implications are.

In the PDF of the book (just released for free, thank you!), page 389,
section 8.5 "Sending messages using a background task" is the
following text:
"Also notice that it is important to commit any change as soon as
possible in order not to lock the database to other concurrent
processes.
As noted in Chapter 4, this type of background process should not be
executed via cron (except perhaps for cron @reboot) because you need
to be sure that no more than one instance is running at the same
time."

So I'm puzzled, if the database is locked, then shouldn't multiple
accessors should be fine?

The email task example, spanning pages 388-389, does a db.commit()
every in body of a loop that is processing the rows rather than at the
end of the loop.
What/where are the locks obtained in this sample script?
Since the script never exits, it can't be just running the script?
Is the lock obtained on the update call and then released on the
commit? I'm not having much luck figuring out how to avoid locking
issues when I can't seem to figure out when Web2py takes and releases
locks. I've just re-read chapter 6 and am still puzzled. Pointers to
where else to look are appreciated.

Thanks
-Doug

Niphlod

unread,
Jun 16, 2012, 2:38:21 PM6/16/12
to web...@googlegroups.com
if you're familiar with database transactions, web2py istantiate a new transaction every request and commit (if no exceptions are thrown automatically) at the end of the request.

in a module, if you want to roll your own "daemon", a transaction is initiated at the start of the script. You have to manually commit to "save" those changes. Issuing db.commit() every loop assures you that the db is consistent with what your daemon has processed already.

The "blocking" part is referenced to databases like SQLite, that remain "blocked" if someone writes to the database and the commit is called later, i.e.
- you have a queue of 500 messages
- you start to send emails, updating or removing row by row to mark that email as sent
- you commit at the end of the 500 block.

While the email are being sent, SQLite will not allow others processes to read/write to that db, so if the "normal" web-application is in the need of reading/writing to the db, your daemon will block your application.

With other db's this should not happen: if you update row by row with your daemon and don't commit those changes "line by line", your app will continue to work.
The only caveat is that if some other process (another daemon or the app itself) reads from the table you're using to process email, until your daemon does not commit all the changes it made, those changes won't be "seend" by others.

To sum things up, db.commit() ensures that all processes looking into that table "see" the same thing.

The book advises against cron execution because if you schedule this daemon to start , let's say, every minute, it could be possible that:
- on 00:00 your daemon starts processing the queue
- on 00:01 another instance of your daemon starts processing the same queue

if the first daemon is still processing the queue, you'll have two processes running for the same task, and if you don't prepare things for multiple "workers", things can get ugly (e.g. two identical emails sent - one from the first process, the second from the second)


Doug Philips

unread,
Jun 16, 2012, 4:16:42 PM6/16/12
to web...@googlegroups.com
On Sat, Jun 16, 2012 at 2:38 PM, Niphlod <nip...@gmail.com> wrote:
> if you're familiar with database transactions, web2py istantiate a new
> transaction every request and commit (if no exceptions are thrown
> automatically) at the end of the request.

Yup, that part makes sense. And I assume it is true regardless of
which DB engine(?)
Which is why sending even one email in your controller is likely to
cause a delay for other users of the site.


> in a module, if you want to roll your own "daemon", a transaction is
> initiated at the start of the script. You have to manually commit to "save"
> those changes. Issuing db.commit() every loop assures you that the db is
> consistent with what your daemon has processed already.

Ok, so a transaction is opened when the script starts (actually, when
DAL is called/created)?
But here is where I get confused. First time through the loop, a
record is updated and db.commit() is called.
So far so good, the DB is committed and the transaction ends.
Second time around the loop, a record is updated (???) and db.commit()
is called. What transaction is that in?
When did that transaction start and why isn't the web app talking to
the same DB just as blocked by my daemon as it would be if I had done
the loop in my controller?


> The "blocking" part is referenced to databases like SQLite, that remain
> "blocked" if someone writes to the database and the commit is called later,
> i.e.
> - you have a queue of 500 messages
> - you start to send emails, updating or removing row by row to mark that
> email as sent
> - you commit at the end of the 500 block.

Yup, I get that you want to commit every loop and release a lock so
that other processes can get access to the DB...


> if the first daemon is still processing the queue, you'll have two processes
> running for the same task, and if you don't prepare things for multiple
> "workers", things can get ugly (e.g. two identical emails sent - one from
> the first process, the second from the second)

Right. What I'm trying to find out is how to write a separate daemon
that won't interfere with Web2py's access to the DB while it is
running and I'm not yet getting a good mental model of how to
structure the main loop (as per the example in The Book, on pages 388
and 389) as I cannot see there where transactions are started, just
where they are commited.

Thanks for trying to explain I hope I can come to an understanding but
I'm not quite there yet!

--=Doug

Niphlod

unread,
Jun 16, 2012, 5:56:45 PM6/16/12
to web...@googlegroups.com
No problem at all.
The bit of magic that I forgot to tell explicitely is that db.commit() commits every opened transaction and starts a new one :D

just to be precise: "locking" is true currently only for sqlite engine, and that's because how sqlite works (it's like that also outside web2py world). SQLite blocks any attemp to write/read the db if there is one process writing to it.
Note, if your site does mostly reading, than SQLite is fine with that.
Additional note, if you're not using SQLite there are no "locks" at all.

Mysql, mssql, postgres, etc can "support" many opened transactions.....

Take it like that. Transactions are a way to provide consistent data to every reader/writer connected to a database.
When you start a transaction you make sure that the changes you are making in that transaction get "seen" by others only when you commit it. When you are making updates/deletes "inside" a transaction, no-one will see those changes..... It's like cooking: you start with all the ingredients but you want to serve it when it's finished and cooked, and you don't want others to peek while mixing ingredients and putting it into the oven.
Before the commit, everyone will just see ingredients; after commit, everyone will see the cooked "version".

Again with the metaphore, with all the dbs a lot of people can concurrently see the ingredients (also when you started mixing and cooking) with no locks whatsoever. They just see the "version" the database saved with the last commit.

SQLite instead "refuses" to provide that feature (currently there is a way, but it's far from this discussion) and allows no-one to access the ingredients if someone is mixing/cooking them.

So, if there is only one daemon running, it's safe to commit every 500 or every 1 with all the dbs: if you commit every row everytime you app will try to see the queue will see a "freshier" version of it, if you commit every 500 the app will see a "older" version of the queue.
The caveat of locking is valid for sqlite only: if you commit every 500 the app won't be allowed to read or write to the db while processing 1 to 500. If you commit every 1, the app will "lock" less.

Doug Philips

unread,
Jun 18, 2012, 12:35:01 AM6/18/12
to web...@googlegroups.com
On Sat, Jun 16, 2012 at 5:56 PM, Niphlod <nip...@gmail.com> wrote:
> The bit of magic that I forgot to tell explicitely is that db.commit()
> commits every opened transaction and starts a new one :D

Ok, just to double check, that is for all DB back-ends or just for SQLite?


> Note, if your site does mostly reading, than SQLite is fine with that.
> Additional note, if you're not using SQLite there are no "locks" at all.

Ah, OK. So it would be easier to write code that uses any DB except
for SQLite... Hmmm

> Take it like that. Transactions are a way to provide consistent data to
> every reader/writer connected to a database.

Right, I've got that part down, it is the "what is the magic that
Web2py is doing behind my back" part that I was (and still a little
bit) unclear on.


> So, if there is only one daemon running, it's safe to commit every 500 or
> every 1 with all the dbs: if you commit every row everytime you app will try
> to see the queue will see a "freshier" version of it, if you commit every
> 500 the app will see a "older" version of the queue.
> The caveat of locking is valid for sqlite only: if you commit every 500 the
> app won't be allowed to read or write to the db while processing 1 to 500.
> If you commit every 1, the app will "lock" less.

Ok, so I think we've established that is a special limitation for SQLite.
For all the other DBs, I can have my Web2py app inserting new records
willy nilly and the background process updating them (mail sent, mail
failed, whatever) without the Web2py app having performance/response
time issues due to database contention.

The trick comes in when SQLite is being used.
Just for completeness, if I do a commit on every loop, what is the
window where Web2py is causing the SQLite lock to be taken?
Is it just between the update_record and the commit?

I ask because I'm not yet clear on how it is that the commit on every
loop will allow the Web2py app to be more responsive than the commit
after all 500 cases. It must be that in the commit on every loop there
is a substantial chance for the Web2py app to get to the database. But
if a commit causes a new transaction to start, how does Web2py get in?
In fact, the example in the book shows an outer loop that looks for
new work, and when done, sleeps for 60 seconds. If the commit call
starts a new transaction, the on the very last of the 500 records,
we'll fall out of the inner loop and then sleep for 60 seconds, which
should lock Web2py out for a full minute? how does the server process
drop the transaction in that case so that the Web2py process has a
chance to get in to the DB?

(I'm beginning to think that The Book should just say: This example is
for any DB except SQLite, don't use SQLite for this kind of processing
???)

Thanks,
--Doug

Anthony

unread,
Jun 18, 2012, 2:07:25 AM6/18/12
to web...@googlegroups.com
> Take it like that. Transactions are a way to provide consistent data to
> every reader/writer connected to a database.

Right, I've got that part down, it is the "what is the magic that
Web2py is doing behind my back" part that I was (and still a little
bit) unclear on.

I don't think web2py is doing any magic -- I think it just relies on the standard database and database driver behavior. For example, here's the documentation on the Python SQLite driver: http://docs.python.org/library/sqlite3.html#sqlite3-controlling-transactions. Once a connection is made, all insert/update/delete statements become part of a transaction. The transaction will auto-commit before any non-DML, non-select statement, or you can manually commit at any time. If you manually commit, then any subsequent insert/update/delete statements will once again become part of a new transaction, until that transaction is committed.

During an HTTP request, web2py automatically does a commit at the end of the request (just before returning the response to the client). However, if you are using the DAL outside of an HTTP request (e.g., in a background process, as in this example), then you must manually commit for your changes to take effect.

The trick comes in when SQLite is being used.
Just for completeness, if I do a commit on every loop, what is the
window where Web2py is causing the SQLite lock to be taken?
Is it just between the update_record and the commit?

I believe so.
 
I ask because I'm not yet clear on how it is that the commit on every
loop will allow the Web2py app to be more responsive than the commit
after all 500 cases. It must be that in the commit on every loop there
is a substantial chance for the Web2py app to get to the database. But 
if a commit causes a new transaction to start, how does Web2py get in?

A commit closes the open transaction, but it doesn't cause the db to lock again immediately. A new transaction doesn't really start until another insert/update/delete statement is issued.

For more on SQLite locking, see http://www.sqlite.org/lockingv3.html.

Anthony

Niphlod

unread,
Jun 18, 2012, 4:02:16 AM6/18/12
to web...@googlegroups.com


Ok, just to double check, that is for all DB back-ends or just for SQLite?

This is true for every backend. 


Ah, OK. So it would be easier to write code that uses any DB except
for SQLite... Hmmm

It's not a question of "easier" or "harder" : as always it's just choosing the right tools for the job :D
 

Ok, so I think we've established that is a special limitation for SQLite.
For all the other DBs, I can have my Web2py app inserting new records
willy nilly and the background process updating them (mail sent, mail
failed, whatever) without the Web2py app having performance/response
time issues due to database contention.

The trick comes in when SQLite is being used.
Just for completeness, if I do a commit on every loop, what is the
window where Web2py is causing the SQLite lock to be taken?
Is it just between the update_record and the commit?

Yep, it's from when you do the first thing that changes the data in the database (update, delete) and the commit.
 

I ask because I'm not yet clear on how it is that the commit on every
loop will allow the Web2py app to be more responsive than the commit
after all 500 cases. It must be that in the commit on every loop there
is a substantial chance for the Web2py app to get to the database. But
if a commit causes a new transaction to start, how does Web2py get in?
In fact, the example in the book shows an outer loop that looks for
new work, and when done, sleeps for 60 seconds. If the commit call
starts a new transaction, the on the very last of the 500 records,
we'll fall out of the inner loop and then sleep for 60 seconds, which
should lock Web2py out for a full minute? how does the server process
drop the transaction in that case so that the Web2py process has a
chance to get in to the DB?

Web2py is not blocked, and again, it's only with SQLite that there is the problem. From the first update on the queue to the commit, no-one can access a SQLite database, hence an app trying to read from that will "hang". Anyway, in the "sleeping" loop, no modifications to the db are made so also with SQLite there will be no problems.
 

(I'm beginning to think that The Book should just say: This example is
for any DB except SQLite, don't use SQLite for this kind of processing
???)

Yep, I'll be fine with it ;-D

Maybe I should make a slice on how to fix this behaviour also for SQLite..... it's just a bit complicated because it requires to rebuild the sqlite module with an updated library.
 

Doug Philips

unread,
Jun 19, 2012, 11:53:21 AM6/19/12
to web...@googlegroups.com
On Mon, Jun 18, 2012 at 2:07 AM, Anthony <abas...@gmail.com> wrote:
> I don't think web2py is doing any magic -- I think it just relies on the
> standard database and database driver behavior.

"There in lies the rub" - the problem is that The Book isn't really
very easy to use to find out just what the DAL does for each DB and
what "shows through" as the behaviour of the underlying DB.

That is, The Book isn't really a reference manual, which is fine, but
it also leaves me wondering if there is something the DAL does but
that I just haven't found/remembered the right part of the The Book to
find it.

....

> A commit closes the open transaction, but it doesn't cause the db to lock
> again immediately. A new transaction doesn't really start until another
> insert/update/delete statement is issued.
>
> For more on SQLite locking, see http://www.sqlite.org/lockingv3.html.

Thanks!
I guess the issue here is that I don't really want to tie my code (or
the example in The Book) to any one DB, but there are DB "quirks"
which manage to show through, and as a new Web2py user, its hard to
build up a model of what the DAL does and what leaks through. :-)

For example, the content at the link you provided ends with "7.0
Transaction Control At The SQL Level" and talks about autocommit mode
vs. "out of autocommit mode". I searched The Book for "autocommit" and
found nothing. I also searched for SQLite and didn't find anything
mentioning that (that I was able to recognize).

Given the "commit at the end of a request" model that Web2py does, I
might guess that autocommit is being disabled?

It matters because with autocommit off, SELECTs will acquire SHARED lock.
If I am reading things correctly, that might cause this server script
to block any writes/updates from the Web2py app.

Specifically, if the server does not find any records to update (i.e.
no work to do), the query/SELECT it does to find work items would
provoke a SHARED lock, but with no records to update the server won't
do a commit (or anything else that i can see) that would drop that
lock?

Thanks,
-Doug

Doug Philips

unread,
Jun 19, 2012, 12:02:09 PM6/19/12
to web...@googlegroups.com
On Mon, Jun 18, 2012 at 4:02 AM, Niphlod <nip...@gmail.com> wrote:
>> Ok, just to double check, that is for all DB back-ends or just for SQLite?
>>
> This is true for every backend.

Thanks!


> Web2py is not blocked, and again, it's only with SQLite that there is the
> problem. From the first update on the queue to the commit, no-one can access
> a SQLite database, hence an app trying to read from that will "hang".
> Anyway, in the "sleeping" loop, no modifications to the db are made so also
> with SQLite there will be no problems.

Ok. I was just about ready to believe that, but as per the link
Anthony sent (see my separate reply just sent), depending on whether
or not SQLite is in autocommit mode, doing a SELECT (query) could
cause it to take a SHARED lock which, while held, would block other
processes from acquiring the locks needed to update. That is, the
server script, just by looking for work and then sleeping, would lock
out any updates. But that depends on the mode of access (autocommit or
not)...


>> (I'm beginning to think that The Book should just say: This example is
>> for any DB except SQLite, don't use SQLite for this kind of processing
>> ???)

> Yep, I'll be fine with it ;-D

I was being half kidding, but only half. :-)


> Maybe I should make a slice on how to fix this behaviour also for
> SQLite..... it's just a bit complicated because it requires to rebuild the
> sqlite module with an updated library.

Oh, that's unfortunate. I have to say, I having the SQLite as part of
Python (i.e. not rebuilding) is a huge win for getting new user's into
the early success that makes them (or at least me) happy to keep using
and expanding their Web2py usage. :-)

Thanks,
--Doug

Anthony

unread,
Jun 19, 2012, 1:33:40 PM6/19/12
to web...@googlegroups.com
> I don't think web2py is doing any magic -- I think it just relies on the
> standard database and database driver behavior.

"There in lies the rub" - the problem is that The Book isn't really
very easy to use to find out just what the DAL does for each DB and
what "shows through" as the behaviour of the underlying DB.

In this regard, I don't think the DAL does anything in particular for each DB -- the standard DB behavior is what "shows through". I suspect most RDBMS'es are similar with respect to these issues -- it is just SQLite that behaves differently because of its architecture.
 
That is, The Book isn't really a reference manual, which is fine, but
it also leaves me wondering if there is something the DAL does but
that I just haven't found/remembered the right part of the The Book to
find it.

Well, the book can't be a complete reference manual for each supported database backend. However, the book does make some effort to identify peculiarities of some of the backends. If you search for "SQLite" in the DAL chapter, there are multiple references to some of its oddities (including that the file is locked on each access). There are also multiple references to the GAE datastore, as well as a section of "gotchas". Anyway, I suppose it would be nice to include a little more detail about locking in SQLite in particular.

I guess the issue here is that I don't really want to tie my code (or
the example in The Book) to any one DB, but there are DB "quirks"
which manage to show through, and as a new Web2py user, its hard to
build up a model of what the DAL does and what leaks through. :-)

I'm not sure it's feasible or desirable for the DAL to try to normalize every last behavior across all database backends. What exactly would you have the DAL do differently in this case?
 
For example, the content at the link you provided ends with "7.0
Transaction Control At The SQL Level" and talks about autocommit mode
vs. "out of autocommit mode". I searched The Book for "autocommit" and
found nothing. I also searched for SQLite and didn't find anything
mentioning that (that I was able to recognize).

web2py is just using the Python standard library sqlite3 module. According to the docs, it looks like it does not default to autocommit mode.
 
It matters because with autocommit off, SELECTs will acquire SHARED lock.
If I am reading things correctly, that might cause this server script
to block any writes/updates from the Web2py app.

As I read it, when an insert/update/delete request is made, SQLite prevents any new read locks (by initiating a PENDING lock), waits for the current reads to complete, and then acquires the necessary lock to do the insert/update/delete.

Specifically, if the server does not find any records to update (i.e.
no work to do), the query/SELECT it does to find work items would
provoke a SHARED lock, but with no records to update the server won't
do a commit (or anything else that i can see) that would drop that
lock?

The shared lock should be dropped as soon as the select (i.e., the read) has completed.

At some point, it might be worth doing some testing to see if you get the behavior you want. If you do db.commit() as in the example, I suspect it will be fine (assuming traffic levels aren't too high).

Anthony

Niphlod

unread,
Jun 19, 2012, 3:31:59 PM6/19/12
to web...@googlegroups.com


I was being half kidding, but only half. :-)

given that that phrase could initiate another round of "web2py sucks" and that is actually a simple statement to say that that behaviour occurs only with standard sqlite, I was not kidding at all :P
 
> Maybe I should make a slice on how to fix this behaviour also for
> SQLite..... it's just a bit complicated because it requires to rebuild the
> sqlite module with an updated library.

Oh, that's unfortunate. I have to say, I having the SQLite as part of
Python (i.e. not rebuilding) is a huge win for getting new user's into
the early success that makes them (or at least me) happy to keep using
and expanding their Web2py usage. :-)

Yep, I don't know exactly the defaults, but I happen to see some major differences against various systems with standard python distribution. Having sqlite3 module included by default it's absolutely a win, but because of its nature of static linked module to a .so or a .dll, it's quite a hassle to update.
Moreover, it's just for this kind of functionality for webapps that could be beneficial to update the module, so I can understand a little bit the "standardized" api and "different" actual sqlite linked version. For having the possibility to avoid this locking one must activate WAL (http://www.sqlite.org/draft/wal.html) on the db. Unfortunately WAL needs sqlite version > 3.7.0

SQLite version is different from the sqlite3 python module version.
On ubuntu 10.04, for example, default python is 2.6.5

>>> import sqlite3
>>> sqlite3.version
'2.4.1'
>>> sqlite3.sqlite_version
'3.6.22'

While on 12.04, default python is 2.7.3
>>> import sqlite3
>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.7.9'

I don't know in other systems, but for windows builds it seems that replacing the dll shipped with python is sufficient. For linux (and maybe mac), one must build the sources separately, but with build extensions and pip that all boils down to (at the time of writing):

sudo pip install http://pysqlite.googlecode.com/files/pysqlite-2.6.3.tar.gz --global-option build_static

and use pysqlite2 instead of sqlite3 (web2py does this by default - tries to import pysqlite2, if not found falls back to standard sqlite3 -  so no additional step required).

As you can see, it's quite not a standard, and given that there are around installations of python 2.4 and that there's no "universal way" to get a simple "fix"..... could be a pain in the ass.

Anyway, as Anthony suggested, writing something and apply "the theory" to that code can't replace a sane "test live functionality": every app has its kinds of loads, and percentages of writes/reads are absolutely not a standard.

Doug Philips

unread,
Jun 20, 2012, 5:42:14 PM6/20/12
to web...@googlegroups.com
On Tue, Jun 19, 2012 at 1:33 PM, Anthony <abas...@gmail.com> wrote:
> I'm not sure it's feasible or desirable for the DAL to try to normalize
> every last behavior across all database backends. What exactly would you
> have the DAL do differently in this case?

Actually what I'm looking for is in the documentation, not the code,
so I don't know yet if there is anything I want the DAL to do
differently.


> web2py is just using the Python standard library sqlite3 module. According
> to the docs, it looks like it does not default to autocommit mode.

"It looks like" is my problem here! :-)

I'm looking for The Book to tell me what to expect.
If it doesn't tell me, then I don't know if the current behaviour
(once I discover it) will be carried forward to future versions of
Web2py as part of the backwards compatibility promise, or, if the
current behaviour is just a quirk of the current implementation that
might change and which could bite me if I rely on it.

Can I count on the DAL setting or not setting autocommit mode? How do
I have any way of knowing if changing that "behind the DAL's back"
would break anything or not. Does the DAL even care?

Since the DAL is trying to abstract away details about particular DB
engines so that Web2py users can write code that "doesn't care", it
also needs to say (for each DB Engine) what assumptions it is
making... that is, what matters to it and what doesn't. It's perfectly
fine to say, for each DB Engine, what configurations/settings/etc.
that Web2py 'cares about' and which it doesn't.


> At some point, it might be worth doing some testing to see if you get the
> behavior you want. If you do db.commit() as in the example, I suspect it
> will be fine (assuming traffic levels aren't too high).

Indeed, I'll have to do that anyways.

Thanks,
--Doug

Anthony

unread,
Jun 20, 2012, 5:53:48 PM6/20/12
to web...@googlegroups.com
I'm looking for The Book to tell me what to expect.
If it doesn't tell me, then I don't know if the current behaviour
(once I discover it) will be carried forward to future versions of
Web2py as part of the backwards compatibility promise, or, if the
current behaviour is just a quirk of the current implementation that
might change and which could bite me if I  rely on it.

Can I count on the DAL setting or not setting autocommit mode? How do
I have any way of knowing if changing that "behind the DAL's back"
would break anything or not. Does the DAL even care?

Since the DAL is trying to abstract away details about particular DB
engines so that Web2py users can write code that "doesn't care", it
also needs to say (for each DB Engine) what assumptions it is
making... that is, what matters to it and what doesn't. It's perfectly
fine to say, for each DB Engine, what configurations/settings/etc.
that Web2py 'cares about' and which it doesn't.

Can you explain how whether SQLite autocommits or not matters in this case? The DAL documentation recommends doing a db.commit() in scripts and background tasks like this precisely because it doesn't want to assume any autocommitting is happening. Within the context of HTTP requests, web2py automatically does a commit at the end of the request. It's not clear the documentation needs to say anything else here -- if you follow what the documentation currently says, you should be fine.

Anthony

Doug Philips

unread,
Jun 20, 2012, 5:54:50 PM6/20/12
to web...@googlegroups.com
On Tue, Jun 19, 2012 at 3:31 PM, Niphlod <nip...@gmail.com> wrote:
> Yep, I don't know exactly the defaults, but I happen to see some major
> differences against various systems with standard python distribution.

Ugh, at least, for the most part, it still all 'just works'. :-/


> Moreover, it's just for this kind of functionality for webapps that could be
> beneficial to update the module, so I can understand a little bit the
> "standardized" api and "different" actual sqlite linked version. For having
> the possibility to avoid this locking one must activate WAL
> (http://www.sqlite.org/draft/wal.html) on the db. Unfortunately WAL needs
> sqlite version > 3.7.0

Yeah, I'm not looking to go quite that far... yet. :-)

> Anyway, as Anthony suggested, writing something and apply "the theory" to
> that code can't replace a sane "test live functionality": every app has its
> kinds of loads, and percentages of writes/reads are absolutely not a
> standard.

Agreed. I just wanted more documentation about the
assumptions/requirements that DAL makes on the underlying database
engine interface.

Thanks
--Doug

Doug Philips

unread,
Jun 20, 2012, 6:04:45 PM6/20/12
to web...@googlegroups.com
On Wed, Jun 20, 2012 at 5:53 PM, Anthony <abas...@gmail.com> wrote:
> Can you explain how whether SQLite autocommits or not matters in this case?
> The DAL documentation recommends doing a db.commit() in scripts and
> background tasks like this precisely because it doesn't want to assume any
> autocommitting is happening. Within the context of HTTP requests, web2py
> automatically does a commit at the end of the request. It's not clear the
> documentation needs to say anything else here -- if you follow what the
> documentation currently says, you should be fine.

From http://www.sqlite.org/lockingv3.html, "7.0 Transaction Control At
The SQL Level":
"The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword is
optional) is used to take SQLite out of autocommit mode. Note that the
BEGIN command does not acquire any locks on the database. After a
BEGIN command, a SHARED lock will be acquired when the first SELECT
statement is executed."

So, if the DAL takes the DB out of autocommit mode, the "select"
(which is done in the main/outer loop of the server script) will cause
a SHARED lock to be taken, right?
Then, let's say, there are no new records to process, so the server
script goes to sleep and does another select/query on its next trip
around the loop.
So far, just fine, the server script doesn't block, right?
However, if the Web2py application wants to update the DB, it can't,
because the server script now has a potentially long-lived SHARED
lock.
Quoting from the same document, "3.0 Locking":
"SHARED The database may be read but not written. Any number of
processes can hold SHARED locks at the same time, hence there can be
many simultaneous readers. But no other thread or process is allowed
to write to the database file while one or more SHARED locks are
active."

Sooo.... Have I misunderstood the SQLite locking description? or
somewhere in the the DAL documentation where it says it'll do a commit
after a select or is it relying on (in the case of SQLIte) the
autocommit to handle that for it?

Thanks!
--Doug

Niphlod

unread,
Jun 20, 2012, 7:06:02 PM6/20/12
to
You're forgetting another "layer".  DAL leverages on DBAPI implementation of sql drivers.
Actually, there is no "begin" transaction on python.....
Usually you instantiate a cursor, do operations, try to commit, if exception is raised you rollback.

You can do all kind of tests for your background operations just defining a simple model and starting two separate web2py's shells (web2py.py -M -S youappname). One is representing the app (1), the other is representing the daemon (2).

Remember that shell never commits unless told so (yeah, more control over it for your tests).
Try yourself:
 - select table on 1, select table on 2 --> no problems
 - update one row on 1 (without committing), select on 2. no locking, 2 shows table as it was before the update on 1, so no problems
 - commit on 1, select on 2. No locking, 2 shows the table with updated records

A little more complicated: get on an infinite loop inserting 10 rows, without committing. while the loop is running on 1, try a loop of 100 updates to those records, then commit. Occasionally you'll be presented with "Database is locked" error.

Anthony

unread,
Jun 20, 2012, 9:06:58 PM6/20/12
to web...@googlegroups.com
Sooo.... Have I misunderstood the SQLite locking description?

I believe so. The shared lock does not last forever -- only as long as it takes to execute the select. Once the data have been read, the shared lock (for that particular select) is released.

Anthony 

Doug Philips

unread,
Jun 23, 2012, 2:24:52 AM6/23/12
to web...@googlegroups.com
On Wed, Jun 20, 2012 at 7:04 PM, Niphlod <nip...@gmail.com> wrote:
> You're forgetting another "layer".  DAL leverages on DBAPI implementation of
> sql drivers.

Well, that's not exactly good news for transparency and ease of documentation.

> A little more complicated: get on an infinite loop inserting 10 rows,
> without committing. while the loop is running on 1, try a loop of 100
> updates to those records, then commit. Occasionally you'll be presented with
> "Database is locked" error.

OK.

But what I would prefer to do is actually reason about what should be
happening rather than just slapping code together and trying to hope
that my tests will be close enough to real world scenarios to be
valid.

Sure, there is documentation about the quirks of SQLite, but nothing
(unless I've missed it after repeated reading) about the
assumptions/defaults, etc of how the DAL uses the intermediate glue
layers (pysqlite, etc.). At this point, I think I'll just avoid using
the DAL unless it is part of the web2py application, it's easier to
write the external server to talk directly to the relevant database.

Thanks for your help!
-=Doug

Doug Philips

unread,
Jun 23, 2012, 2:26:53 AM6/23/12
to web...@googlegroups.com
Yes, if autocommit is on. The part you didn't quote was that
autocommit can be disabled in which case a SHARED lock is held.
Since the documentation about what the DAL uses/needs/assumes about
the underlying Python layer (pysqlite in this case) isn't there, its
just easier to avoid using it outside of a Web2py application.

Thanks for all the help,
-Doug

Anthony

unread,
Jun 23, 2012, 1:24:16 PM6/23/12
to web...@googlegroups.com
> I believe so. The shared lock does not last forever -- only as long as it
> takes to execute the select. Once the data have been read, the shared lock
> (for that particular select) is released.

Yes, if autocommit is on. The part you didn't quote was that
autocommit can be disabled in which case a SHARED lock is held.

No, when autocommit is disabled, that doesn't immediately result in a shared lock being acquired and held indefinitely. A shared lock is acquired when a select is made, but only held as long as it takes to actually do the reading for that select. Just do a commit after your inserts/updates/deletes, and you should be fine.
 
Since the documentation about what the DAL uses/needs/assumes about
the underlying Python layer (pysqlite in this case) isn't there, its
just easier to avoid using it outside of a Web2py application.

Seems rather extreme. What information do you need that would impact your usage? Note, the good thing about open source is that you can just look at the source code:
Other than that, the relevant documentation regarding how the driver works is here: http://docs.python.org/library/sqlite3.html.

Anthony

Anthony

unread,
Jun 23, 2012, 1:54:30 PM6/23/12
to web...@googlegroups.com
> You're forgetting another "layer".  DAL leverages on DBAPI implementation of
> sql drivers.

Well, that's not exactly good news for transparency and ease of documentation.

DB API is just a specification for Python database drivers to follow to ensure some degree of similarity and portability: http://www.python.org/dev/peps/pep-0249/. On net, this is probably good news for transparency and ease of documentation because you have some consistency across drivers. What alternative would you propose for greater transparency and ease of documentation? 
 
But what I would prefer to do is actually reason about what should be
happening

I believe we have done this. If you remain unconvinced, some testing might increase your confidence in the conclusions.
 
Sure, there is documentation about the quirks of SQLite, but nothing
(unless I've missed it after repeated reading) about the
assumptions/defaults, etc of how the DAL uses the intermediate glue
layers (pysqlite, etc.).

Again, what specific assumptions about the "intermediate glue layers" would affect the code you would write in this case? If the issue is auto-committing, I believe the recommended code would work in either case. I'm not saying I would mind seeing a little more documentation regarding the differences across some of the databases and their drivers -- I'm just wondering how much it matters in this particular case.

Anthony

Doug Philips

unread,
Jun 27, 2012, 8:23:00 AM6/27/12
to web...@googlegroups.com
On Sat, Jun 23, 2012 at 1:54 PM, Anthony <abas...@gmail.com> wrote:
>> > You're forgetting another "layer".  DAL leverages on DBAPI
>> > implementation of sql drivers.
>>
>> Well, that's not exactly good news for transparency and ease of
>> documentation.
>
> DB API is just a specification for Python database drivers to follow to
> ensure some degree of similarity and
> portability: http://www.python.org/dev/peps/pep-0249/. On net, this is
> probably good news for transparency and ease of documentation because you
> have some consistency across drivers. What alternative would you propose for
> greater transparency and ease of documentation?

Well, earlier it was claimed that the DAL for SQLite left autocommit
on, but the DBAPI says:
"Note that if the database supports an auto-commit feature, this must
be initially off."
So now I have an extra place to go look and try to figure out, does
the DAL actually implement DBAPI with autocommit or does the DAL
implement DBAPI but with an exception re: autocommit? Having more
places to look is not 'ease of documentation' from a user's
perspective. :-/


> Again, what specific assumptions about the "intermediate glue layers" would
> affect the code you would write in this case? If the issue is
> auto-committing, I believe the recommended code would work in either case.
> I'm not saying I would mind seeing a little more documentation regarding the
> differences across some of the databases and their drivers -- I'm just
> wondering how much it matters in this particular case.

It matters if the query at the top of the loop holds a lock on the DB,
which means that when there are no rows found to process, the database
remains locked while the server is sleeping...

-=Doug

Doug Philips

unread,
Jun 27, 2012, 8:45:55 AM6/27/12
to web...@googlegroups.com
On Sat, Jun 23, 2012 at 1:24 PM, Anthony <abas...@gmail.com> wrote:
> No, when autocommit is disabled, that doesn't immediately result in a shared
> lock being acquired and held indefinitely. A shared lock is acquired when a
> select is made, but only held as long as it takes to actually do the reading
> for that select. Just do a commit after your inserts/updates/deletes, and
> you should be fine.

I don't think you are understanding, I'm not concerned about the case
when the query/select finds work to do, I'm concerned about the case
when the server looks, and finds nothing. I agree, when autocommit is
on, the lock will be dropped after the query returns. However, it
autocommit is off, and again referring to
http://www.sqlite.org/lockingv3.html section 7:
"After a BEGIN command, a SHARED lock will be acquired when the
first SELECT statement is executed. ... The SQL command "COMMIT" does
not actually commit the changes to disk. It just turns autocommit back
on."

Again, with autocommit off:
The server process does a query/select to see if there is any work to
do. This provokes SQLite to acquire a SHARED lock. Then the server
process decides "oops, no work to do, let's go to sleep for a minute".
Notice that it does nothing to release that lock. So the Web2py
process is now locked out of doing any updates because the SHARED lock
will block the acquisition of locks needed to actually write/update
the database. I didn't find anything in the SQLite documentation that
would indicate otherwise, but if you do, please point me to the
specific document/section. (I just searched through The Book again and
found no mention of autocommit. Searching for just 'commit' finds only
a handful of places that make no mention of it either.)



>> Since the documentation about what the DAL uses/needs/assumes about
>> the underlying Python layer (pysqlite in this case) isn't there, its
>> just easier to avoid using it outside of a Web2py application.
>
>
> Seems rather extreme. What information do you need that would impact your
> usage? Note, the good thing about open source is that you can just look at
> the source code.

Technically True. And that has allowed me to do my own work arounds
(and submit patches), which has been extremely helpful.
So for fixing bugs, it's great!
However...
One of the things I really like about Web2py is the promise of
backwards compatibility.
Is that promise meant to encompass all the current decisions embodied
in the implementation?
I wouldn't assume so.
The Book does not say: "For anything not covered here, read the source
and base your expectation of backwards compatibility on that."
The Book does say: "web2py has always been built from the user
perspective and is constantly optimized internally to become faster
and leaner, whilst always maintaining backward compatibility." and
"The source code of the Database Abstraction Layer was completely
rewritten in 2010. While it stays backward compatible, the rewrite
made it more modular and easier to extend."
So I should fully expect the implementation to change; anything I find
by "reading the source" is clearly something that is, and has been,
subject to change. Sorry, but the only sane thing to do is to rely
only on what is documented and covered under the blanket promise of
backwards compatibility.

-=D

Anthony

unread,
Jun 27, 2012, 9:42:19 AM6/27/12
to web...@googlegroups.com
Well, earlier it was claimed that the DAL for SQLite left autocommit
on,

A quote from my earlier response:

web2py is just using the Python standard library sqlite3 module. According to the docs, it looks like it does not default to autocommit mode.

So now I have an extra place to go look and try to figure out, does
the DAL actually implement DBAPI with autocommit or does the DAL
implement DBAPI but with an exception re: autocommit? Having more
places to look is not 'ease of documentation' from a user's
perspective. :-/

The DAL does not implement DBAPI at all. The DAL simply uses Python drivers for databases, and those drivers are supposed to implement the DBAPI specifications (and it looks like the SQLite driver does correctly implement the specification regarding autocommit). More importantly, it appears that autocommit is irrelevant to the issue at hand.
 
It matters if the query at the top of the loop holds a lock on the DB,
which means that when there are no rows found to process, the database
remains locked while the server is sleeping...

No, that's my point, it doesn't matter. The shared lock only lasts as long as it takes to read the records -- by the time it hits the for loop (even if the loop doesn't get entered due to no records), the shared lock is released. Whether or not autocommit is on shouldn't matter, so there's nothing in particular to document here that makes a difference for how to code a background task like this. As the docs say, when making changes to the db outside an HTTP request (e.g., in a background task script), do db.commit(). Other than that, nothing to worry about.

Anthony

Anthony

unread,
Jun 27, 2012, 10:13:00 AM6/27/12
to web...@googlegroups.com
Again, with autocommit off:
The server process does a query/select to see if there is any work to
do. This provokes SQLite to acquire a SHARED lock. Then the server
process decides "oops, no work to do, let's go to sleep for a minute".
Notice that it does nothing to release that lock.

Yes, it does release the lock, as soon as the select is complete. You do not have to do db.commit() after a select.

Anthony

Niphlod

unread,
Jun 27, 2012, 10:20:51 AM6/27/12
to web...@googlegroups.com
Some nice assumptions here... are you actually suggesting that:
- could ever be a stable release of web2py out there that will hold the lock for any pending select ?
- the book suggesting a good behaviour is absolutely wrong ?

Web2py developers and testers in general would spot that in a breeze while testing trunk, and it'll be fixed in the following stable release.

There are no evil people among web2py trying to make queries go slower, block wherever they want your code, and generally having to make you code crappy workarounds.

Anyway, if you are so concerned about those problems and want to mantain your own forked DAL version where you are in control of every aspect, be my guest..... but it kind of defeat the "I'll adopt a framework" paradigm.
When you face something like frameworks, you have to "give some credit" to the fact that n contributors, often very well versed in at least the piece of code they submitted/tested/patched - the 90% of the times surely more informed than yourself - did a good job.


Doug Philips

unread,
Jun 27, 2012, 10:26:38 AM6/27/12
to web...@googlegroups.com
Please show me where that is documented.
Section 7 of http://www.sqlite.org/lockingv3.html says that ONLY
happens when autocommit is on, and you have stated that it is off
(contrary to my mistaken memory). Further, The Book says that only
inside of a Web2py app does commit/rollback back happen automatically.
Thanks,
--Doug

Anthony

unread,
Jun 27, 2012, 11:53:04 AM6/27/12
to web...@googlegroups.com
> Yes, it does release the lock, as soon as the select is complete. You do not 
> have to do db.commit() after a select. 
Please show me where that is documented. 
Section 7 of http://www.sqlite.org/lockingv3.html says that ONLY 
happens when autocommit is on, and you have stated that it is off 
(contrary to my mistaken memory).

End of Section 4:

Once all reading has completed, the SHARED lock is dropped.
 
Further, The Book says that only 
inside of a Web2py app does commit/rollback back happen automatically.

Commit is not needed for a select. Committing is for committing changes to the database -- selects only read from the database.

Anthony

Doug Philips

unread,
Jun 27, 2012, 11:24:29 PM6/27/12
to web...@googlegroups.com
On Wed, Jun 27, 2012 at 11:53 AM, Anthony <abas...@gmail.com> wrote:
>> > Yes, it does release the lock, as soon as the select is complete. You do
>> > not
>> > have to do db.commit() after a select.
>> Please show me where that is documented.
>> Section 7 of http://www.sqlite.org/lockingv3.html says that ONLY
>> happens when autocommit is on, and you have stated that it is off
>> (contrary to my mistaken memory).
>
>
> End of Section 4:
>
> Once all reading has completed, the SHARED lock is dropped.

Thanks, I had missed that part.


>> Further, The Book says that only
>> inside of a Web2py app does commit/rollback back happen automatically.
>
>
> Commit is not needed for a select. Committing is for committing changes to
> the database -- selects only read from the database.

Ok, so if (and I'm looking for a more general that just SQLite answer
here), I want to get exclusive access to the DB, I should first do a
'dummy' change.

More explicitly, if the server process were first to 'update' a record
(solely for the purpose of locking the DB), then read through pending
entries, it could mark them as 'working on by server process x',
commit, and be sure that no other process (such as another
worker/server or even a user coming in through web2py proper) would be
able to also pick up that work? Even if that just works for SQLite,
I'm cool, but I'd like a solution that worked for all DBs if that is
possible without a lot of machinations.

In slightly more detail, there could be a server's table, and as each
server 'wakes up' looking for work to do, it could udpate its own row
in that table (hence provoking a lock), look for work, flag the work
as being handled by itself, then commit. It could then take its own
sweet time to do the work, knowing that no other server process would
have also claimed that work?

-=Doug

Anthony

unread,
Jun 28, 2012, 12:08:15 AM6/28/12
to web...@googlegroups.com
SQLite is a file-based database, so has to have more course locking than a typical client/server database. Other databases have "select for update" functionality, which allows you to select a set of records and lock them. web2py actually supports this, via .select(..., for_update=True).

Anyway, you might be better off looking into using the scheduler, or a third-party task queue like Celery (a web2py Celery plugin was started at some point, though not sure it is complete).

Anthony

Jonathan Lundell

unread,
Jun 28, 2012, 1:02:51 AM6/28/12
to web...@googlegroups.com
On Jun 27, 2012, at 9:08 PM, Anthony wrote:
SQLite is a file-based database, so has to have more course locking than a typical client/server database. Other databases have "select for update" functionality, which allows you to select a set of records and lock them. web2py actually supports this, via .select(..., for_update=True).


The SQLite adapter simulates select for update by wrapping the select+update in a transaction. Not too efficient, since it has to lock the entire database, but it'll do it.

Michael Toomim

unread,
Jun 28, 2012, 1:06:50 AM6/28/12
to web...@googlegroups.com
This is all a great unearthing of the Mystery of Transactions. Thanks for the investigation, Doug.

This was difficult for me to learn when I got into web2py as well. Perhaps we could write up all this knowledge somewhere, now that you're figuring it out?

Can we have a section on Transactions in the book, or somewhere?

Doug Philips

unread,
Jun 28, 2012, 7:10:18 AM6/28/12
to web...@googlegroups.com
On Thu, Jun 28, 2012 at 12:08 AM, Anthony <abas...@gmail.com> wrote:
> SQLite is a file-based database, so has to have more course locking than a
> typical client/server database. Other databases have "select for update"
> functionality, which allows you to select a set of records and lock them.
> web2py actually supports this, via .select(..., for_update=True).

Thanks, "for_update" is not documented in The Book, but is exactly
what I am looking for.

Checking the gluon code for Version 1.99.7 (2012-03-04 22:12:08) stable
shows that SQLite tries to support 'for_update', unfortunately it
doesn't work (causes an exception). So far as I could tell by
searching, that wasn't already reported so I created a new issue:
http://code.google.com/p/web2py/issues/detail?id=864 for it.

> Anyway, you might be better off looking into using the scheduler, or a
> third-party task queue like Celery (a web2py Celery plugin was started at
> some point, though not sure it is complete).

I don't understand, how will that let me do a read-modify-update
transaction? It seems as if for_update is precisely what I need to do
that.

Thanks,
-Doug

Doug Philips

unread,
Jun 28, 2012, 7:12:00 AM6/28/12
to web...@googlegroups.com
On Thu, Jun 28, 2012 at 1:02 AM, Jonathan Lundell <jlun...@pobox.com> wrote:
> The SQLite adapter simulates select for update by wrapping the select+update
> in a transaction. Not too efficient, since it has to lock the entire
> database, but it'll do it.

Thanks, I wish it worked in the released version, and I'm hopeful that
it can be made to work in the next version!
-=D

Anthony

unread,
Jun 28, 2012, 8:04:50 AM6/28/12
to web...@googlegroups.com
> Anyway, you might be better off looking into using the scheduler, or a
> third-party task queue like Celery (a web2py Celery plugin was started at
> some point, though not sure it is complete).

I don't understand, how will that let me do a read-modify-update
transaction? It seems as if for_update is precisely what I need to do
that.

I thought your original goal was to set up a task queue -- those suggestions are specialized for that purpose.

Anthony

Doug Philips

unread,
Jun 29, 2012, 3:45:25 PM6/29/12
to web...@googlegroups.com
Yes, it is. The email-queue example in The Book has a caveat that only
one server can be used. That's because with more than one server
process, the same work item might be processed several times. If I had
a read-modify-update transaction, then a server could mark a
particular work item as taken/in-progress and it wouldn't matter how
many servers I had. It would work with just one, or if I needed to
spin up a few more, I could that without introducing errors,
race-conditions, etc. Right?

-=Doug
Reply all
Reply to author
Forward
0 new messages