Pysqlite User Who Needs Access to the Backup API

226 views
Skip to first unread message

faldridge

unread,
Oct 15, 2010, 11:12:33 PM10/15/10
to python-sqlite
Hi, I am a frequent user (and fan!) of pysqlite, and in an application
I need to be able to access SQLite databases through the pysqlite
wrapper rather than apsw, but I also need to use SQLite's online
backup API.

I noticed the following in the commit log in rev cccff20934 of
pysqlite:

"Removed docs about combining pysqlite and apsw. This was
experimental and I
don't want to support it."

My question to the developer of pysqlite consequently is: if you don't
want to support combining pysqlite and apsw, would you be interested
in a patch for pysqlite that implements the online backup API in a
similar manner to apsw?

Thank you for your time and your excellent software.

Roger Binns

unread,
Oct 15, 2010, 11:43:23 PM10/15/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/15/2010 08:12 PM, faldridge wrote:
> My question to the developer of pysqlite consequently is: if you don't
> want to support combining pysqlite and apsw,

Note that you can still load APSW and pysqlite in the same process - the
only thing you won't be able to do with that change is get them to share the
same database handle. (ie you'd have to open the database file once with
each library which will work fine for everything except :memory: databases.)

There is a request for APSW to support sqlite database handles from outside:

http://code.google.com/p/apsw/issues/detail?id=79

However this is very difficult to do if you want to be 100% reliable as
SQLite does not provide a good object model (eg reference counting rarely
used, no notification on closes) so it would be possible to cause process
crashes.

> would you be interested in a patch for pysqlite that implements the
> online backup API in a similar manner to apsw?

The APSW license is very liberal so the code can just be copied over and
lightly edited, and I am also happy to assign copyright etc.

The issue for pysqlite is that it would require SQLite 3.6.11 or newer
(released Feb 2009) whereas the current pysqlite will work with far older
versions. I don't know how Gerhard handles that sort of thing.

Roger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAky5H1cACgkQmOOfHg372QTgMwCfYW1rD1JEnx6VLtqe1rmb9NqK
gSQAnRpi+Rdp/+2LzV+0cqJYNw1ze2OA
=jgf8
-----END PGP SIGNATURE-----

faldridge

unread,
Oct 16, 2010, 4:49:30 PM10/16/10
to python-sqlite
Roger,

Thanks for your quick and informative reply! I have a few follow-up
questions for you.

On Oct 15, 10:43 pm, Roger Binns <rog...@rogerbinns.com> wrote:

> Note that you can still load APSW and pysqlite in the same process - the
> only thing you won't be able to do with that change is get them to share the
> same database handle.  (ie you'd have to open the database file once with
> each library which will work fine for everything except :memory: databases.)

This won't work for me as my database files for this particular
application are sufficiently large and write-heavy that attempting to
use the backup API without sharing the database handle will (as far as
I can tell from the testing I have done) cause the backup process to
endlessly restart and never finish as per the SQLite Online Backup API
documentation:

"If another thread or process writes to the source database while
this function is sleeping, then SQLite detects this and usually
restarts the
backup process when sqlite3_backup_step() is next called. There
is one exception to this rule: If the source database is not an in-
memory
database, and the write is performed from within the same process
as the backup operation and uses the same database handle (pDb), then
the
destination database (the one opened using connection pFile) is
automatically updated along with the source. The backup process may
then be
continued after the xSleep() call returns as if nothing had
happened."

So, it comes down to two possibilities for me: build apsw and pysqlite
against the same shared SQLite library or pull your backup API
implementation into pysqlite. Because of the issue you mentioned with
SQLite versions prior to 3.6.11 and because it just seems redundant, I
am now leaning towards combining the two SQLite wrappers for my
application.

Having found the dynamic library that pysqlite is linked against using
ldd, is there anyway to force apsw to build using that particular
library? Or you do have any other suggestions for getting the two to
work together? I have already tried to get apsw and pysqlite to work
together a week or so ago with no success.

Thanks for your time and effort,
Forrest

Roger Binns

unread,
Oct 16, 2010, 6:08:41 PM10/16/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/16/2010 01:49 PM, faldridge wrote:
> This won't work for me as my database files for this particular
> application are sufficiently large and write-heavy that attempting to
> use the backup API without sharing the database handle will (as far as
> I can tell from the testing I have done) cause the backup process to
> endlessly restart and never finish as per the SQLite Online Backup API
> documentation:

Does WAL mode help at all? In theory it means the database is not being
written to at all, and instead writes are diverted to the WAL until a
checkpoint is done. You should be able to backup the database then.

> Having found the dynamic library that pysqlite is linked against using
> ldd, is there anyway to force apsw to build using that particular
> library?

With both you can create a setup.cfg with include_dirs and library_dirs
pointing to the same place which will then ensure they use the same header
and libraries.

> Or you do have any other suggestions for getting the two to
> work together?

You could switch to APSW completely :-)

Also for the record I did not change nor did I ever plan to change anything
in APSW to make providing its handles to pysqlite stop working.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAky6ImIACgkQmOOfHg372QQLDgCfWtTC0m6FE/MJ3Gs3u5CcwuNz
VMcAmQHcZ7OApHUiA/XtjMEqnutk12I1
=A9L/
-----END PGP SIGNATURE-----

Edzard Pasma

unread,
Oct 17, 2010, 11:34:48 AM10/17/10
to python...@googlegroups.com

On 17-okt-2010, Roger Binns wrote:

>
> You could switch to APSW completely :-)
>

Hi, I just cut this remark out of a large context. Swtiching between
the two wrappers is not such a big deal if you add an extra layer
around APSW to make it appear like pysqlite. I did this for my own
project. The resulting module, apsw_dbapi2, offers most of the
Pysqlite interface but should also allow to address APSW features.
I'd be happy happy to mail it. A limitation, in its current version,
is that isolation_level must be set to None, i.e. it does not do
automatic transactions. But I may dig up an older version where that
was included as well. Best regards, Edzard Pasma

Roger Binns

unread,
Oct 17, 2010, 1:41:03 PM10/17/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/17/2010 08:34 AM, Edzard Pasma wrote:
> The resulting module, apsw_dbapi2, offers most of the Pysqlite interface but should also
> allow to address APSW features.

Do you have the source somewhere (eg Google code)? There may be things I
can do to APSW to make your code easier and smaller.

> does not do automatic transactions.

I must admit I have never understood why anyone would want them.

> But I may dig up an older version where

> that [automatic transactions] was included as well.

How did you implement that? pysqlite actually parses the SQL text which
seems a little yucky to me. In theory it should be possible to do with
authorisers although you'll hit cases like "insert ... (select ...)".

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAky7NP4ACgkQmOOfHg372QQH+QCg1Qcje2680YCP3czRl9W8L5+G
T7EAoIw91FQWHuohpApYCZG2BZqY9GcT
=/w9C
-----END PGP SIGNATURE-----

Edzard Pasma

unread,
Oct 17, 2010, 4:08:51 PM10/17/10
to python...@googlegroups.com
On 17-okt-2010, Roger Binns wrote:

On 10/17/2010 08:34 AM, Edzard Pasma wrote:
The resulting module, apsw_dbapi2, offers most of the Pysqlite interface but should also
allow to address APSW features.

Do you have the source somewhere (eg Google code)?  There may be things I
can do to APSW to make your code easier and smaller.

does not do automatic transactions. 

I must admit I have never understood why anyone would want them.

But I may dig up an older version where
that [automatic transactions] was included as well.

How did you implement that?  pysqlite actually parses the SQL text which
seems a little yucky to me.  In theory it should be possible to do with
authorisers although you'll hit cases like "insert ... (select ...)".

Roger

Hi,

Apsw_dbapi2 is in a cheese shop project and the code can be viewed here:
http://packages.python.org/sqmediumlite/src/apsw_dbapi2.py.html

Above version does not include automatic transactions. That is not a matter of desinterest, but because they are handled in an other part of the containing package. It once was in apsw_dbapi2, in the version here:


The statement type is determined by checking the first three letters of a statement. This feels somewhat ugly I admit. But it is basically the same what Pysqlite does. I believe it is not in the line of APSW and may compromise its leanness, hence a separate layer.

If there is interest I or we might reassemble a new version dedicated to use outside its current package. I could also provide the Pysqlite.Row row factory.

Best regards,

Edzard Pasma.

Gerhard Häring

unread,
Oct 26, 2010, 11:12:52 AM10/26/10
to python...@googlegroups.com
I was kinda busy lately. But now I've created a added backup functionality to pysqlite (fetch the trunk Mercurial version from Google Code and see attached example). It may be a bit rough still, and I haven't even looked into the APSW implementation for now.

But I hope it helps.

-- Gerhard
backuptest.py

Roger Binns

unread,
Oct 26, 2010, 10:39:35 PM10/26/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/26/2010 08:12 AM, Gerhard H�ring wrote:
> I was kinda busy lately. But now I've created a added backup functionality
> to pysqlite (fetch the trunk Mercurial version from Google Code and see
> attached example). It may be a bit rough still, and I haven't even looked
> into the APSW implementation for now.

You should have :-) At least you can make the API the same:

http://apidoc.apsw.googlecode.com/hg/backup.html

You seem to have picked the same API except finish is missing, as is done.

finish really should be exposed. At the moment that is called by the
destructor (which can't raise an exception) and you have no control over
when or which thread the destructor is called in anyway.

The parameters to create the backup object are also in a different order
where you have optimized for convenience and I have optimized for explicit
is better than implicit :-)

We also put the backup method on different objects. I put it on the
destination database and you put it on the source. I don't remember why I
picked the order I did - best guess is that matches the SQLite API. Also
you have to make sure that no regular API calls are made to the destination
database handle. Quoting from the SQLite docs:

However, the application must guarantee that the destination
database connection is not passed to any other API (by any thread)
after sqlite3_backup_init() is called and before the corresponding
call to sqlite3_backup_finish(). SQLite does not currently check
to see if the application incorrectly accesses the destination
database connection and so no error code is reported, but the
operations may malfunction nevertheless. Use of the destination
database connection while a backup is in progress might also also
cause a mutex deadlock.

Since APSW is thread safe (and thread correct) I keep track of usage and can
make that guarantee.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkzHkOIACgkQmOOfHg372QRZuQCfZozqU+YNvph+sCXZHO5aXGLx
pJUAnjS9yZurupsF8AU1IET1jq/6LUF+
=zxV5
-----END PGP SIGNATURE-----

faldridge

unread,
Oct 27, 2010, 11:25:34 AM10/27/10
to python-sqlite
Gerhard,

Thank you so much for working on this. I've played around with your
version, and it is a great start, but there are a few points I'd like
to make:

Firstly, I agree with Roger that sqlite3_backup_finish needs to be
exposed. This is especially important because of the restrictions
placed on the destination database connection that Roger mentioned (ie
that the connection cannot be passed to any other API by any other
thread in between calls to backup_init and backup_finish).

Also, as I'm sure you expected for such a rough version, I have found
at least one critical bug in the current implementation.

Callers of Backup.step *must* be able to check for a return code of
SQLITE_BUSY or SQLITE_LOCKED. As the API docs mention, these return
codes are not fatal and sqlite3_backup_step may be retried. However,
the following snippets from two REPL sessions show the expected
behavior via apsw and incorrect behavior from the current pysqlite
implementation:

Correct behavior with apsw:
>>> import apsw
>>> source = apsw.Connection('source.sqlite')
>>> dest = apsw.Connection('dest.sqlite')
>>> backup = dest.backup("main", source, "main")
>>> cursor = source.cursor()
>>> cursor.execute("BEGIN EXCLUSIVE")
<apsw.Cursor object at 0x100482850>
>>> backup.step(100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
apsw.BusyError: BusyError: error
>>> cursor.execute("ROLLBACK")
<apsw.Cursor object at 0x100482850>
>>> backup.step(100)
True

Incorrect behavior with pysqlite:
>>> from pysqlite2 import dbapi2
>>> source = dbapi2.connect('source.sqlite')
>>> dest = dbapi2.connect('dest.sqlite')
>>> backup = source.backup(dest, "main", "main")
>>> cursor = source.cursor()
>>> cursor.execute("BEGIN EXCLUSIVE")
<pysqlite2.dbapi2.Cursor object at 0x10047a810>
>>> backup.step(100)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pysqlite2.dbapi2.OperationalError: not an error
>>> cursor.execute("ROLLBACK")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
pysqlite2.dbapi2.OperationalError: cannot rollback - no transaction is
active
>>> backup.step(100)
True

Note: the different transaction behavior is an interesting side
phenomenon, and I am not entirely sure what the correct behavior is in
this case, though as far as I understand it, issuing a "BEGIN
EXCLUSIVE;" should be enough to create an "active" transaction.

I desperately need the correct error reporting behavior in my own
application, and so I am going to work on this as well. I have
abandoned my previous effort to translate apsw's approach directly to
pysqlite in favor of attempting to help flesh out this implementation.

To prevent effort from being duplicated needlessly, I created a server-
side clone of pysqlite, on which I hope to address the return code
issue as well as expose sqlite3_backup_finish in pysqlite's API. I
haven't written any significant C code in over 5 years though, so any
further help from either Gerhard or Roger would, of course, be greatly
appreciated.

Thanks, as always, for your time and effort,
Forrest
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

faldridge

unread,
Oct 29, 2010, 10:15:24 AM10/29/10
to python-sqlite
OK. My clone is up at:

http://code.google.com/r/forrestaldridge-backup-api/

So far, I have:
1. Fixed a trivial bug where Backup.pagecount was mistakenly calling
sqlite3_backup_remaining() instead of sqlite3_backup_pagecount().
2. Added OperationalError subclasses to cover all the cases in
_pysqlite_seterror.
3. Modified Backup.step() to return the appropriate OperationalError
subclass to its callers.

Next on my list is exposing sqlite3_backup_finish() in pysqlite's
implementation of the online backup API. Thus far, I have also tried
to maintain the style and general approach of pysqlite in my changes,
to provide a cohesive API for developers.

If anyone has the time, please take a glance at my changes and let me
know what you think.

Thanks,
Forrest
Reply all
Reply to author
Forward
0 new messages