Slow deletion

32 views
Skip to first unread message

durumdara

unread,
Jun 11, 2010, 2:14:57 AM6/11/10
to python-sqlite
Hi!

I have a project that handle images. It got input images and it is
convert these images to anothers. The users can select which will be
the final version. When all images finished for the actual project, I
make an album from them, and delete the temp. data.

First I used PostgreSQL for this. But Bytea deletion was extremely
slow. Sometimes 1-2 hours. This was unacceptable.

Then I changed to AnyDBM. This have problem that on 2 GB slowed down,
and one item insert was a half hour...

Then I changed to SQLite. Everything is good, but the deletion of the
blob table is slow.
Interesting, that "delete from blobs" is have same speed as "drop
table blobs"... :-(

If I delete finished blobs from a 286 MB sized database, it is 1-5
minutes...

And not speak about when I tried to VACUUM a 2 GB database, I killed
the SQLite on 1 hours, because it not finished with it... :-(

I tried to set the page size from 1024 to 32768, but the speed is only
half of it.

The blob table is:

filename varchar(255) not null,
ext varchar(3) not null,
size integer not null,
data blob,
primary key (filename, ext, size)

May this caused by Primary Key?
Because constraints are undroppable from the table, I cannot test what
is the speed of deletion without primary key...

Please help me, how to improve the speed of the deletion, how to make
it faster?

Thanks:
dd

Christian Boos

unread,
Jun 11, 2010, 9:37:57 AM6/11/10
to python...@googlegroups.com

Exploit the fact that a SQLite database is contained in a single file.
It will probably be faster to create a new database, copy there what you
want to keep, then rename tmpfile.db to yournormal.db, reopen the
connection and you're done.

This goes along the lines of:

ATTACH DATABASE 'tmpfile.db' AS pruned;
CREATE TABLE pruned.blob ... ;
INSERT INTO pruned.blob SELECT * FROM blob WHERE <what you need to keep>;
DETACH DATABASE pruned

-- Christian

Joshua J. Kugler

unread,
Jun 11, 2010, 1:11:43 PM6/11/10
to python...@googlegroups.com
On Friday 11 June 2010, Christian Boos elucidated thus:

> > Please help me, how to improve the speed of the deletion, how to
> > make it faster?
>
> Exploit the fact that a SQLite database is contained in a single
> file. It will probably be faster to create a new database, copy there
> what you want to keep, then rename tmpfile.db to yournormal.db,
> reopen the connection and you're done.
>
> This goes along the lines of:
>
> ATTACH DATABASE 'tmpfile.db' AS pruned;
> CREATE TABLE pruned.blob ... ;
> INSERT INTO pruned.blob SELECT * FROM blob WHERE <what you need to
> keep>; DETACH DATABASE pruned

With the risk of sounding sarcastic, I would suggest: don't use the
database for image storage. Write to files, and keep track of file
paths in the database. It *will* be faster. Since you're using
sqlite, I assume you're not doing replication and such, so that
shouldn't be an issue.

j

--
Joshua Kugler
Part-Time System Admin/Programmer
http://www.eeinternet.com
PGP Key: http://pgp.mit.edu/ �ID 0x73B13B6A

Roger Binns

unread,
Jun 12, 2010, 1:33:56 AM6/12/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/10/2010 11:14 PM, durumdara wrote:
> Interesting, that "delete from blobs" is have same speed as "drop
> table blobs"... :-(

I don't quite understand what you mean by that, but the former is
implemented almost exactly as the latter:

http://www.sqlite.org/lang_delete.html

> The blob table is:
>
> filename varchar(255) not null,
> ext varchar(3) not null,
> size integer not null,
> data blob,
> primary key (filename, ext, size)

Is that exactly the order you declared the columns? You should always put
blobs last otherwise SQLite has to walk through the blob to get to columns
after it.

In any event this is not a good use case for SQLite. The images you make
are "produced" from the originals so you only need to store the parameters
of that production in the database rather than the actual output and use the
filesystem for what it is really good at - storing files.

Roger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwTHD8ACgkQmOOfHg372QQCyQCgj28l//CvxJteXiiURgz3rFU/
0TUAoNdVUBwlO1TgCE7slkhqBUhqL9iq
=kIlp
-----END PGP SIGNATURE-----

durumdara

unread,
Jun 14, 2010, 3:05:36 AM6/14/10
to python-sqlite
Hi!

I understand it - files are better placed in filesystem - but database
is protect the operation with transactions in I can keep all data in
one secure place...

As I see some tests are faster than others, may this caused by
differents pages.

Thanks for your help:
dd
> Comment: Using GnuPG with Mozilla -http://enigmail.mozdev.org/

Roger Binns

unread,
Jun 14, 2010, 8:25:15 AM6/14/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/14/2010 12:05 AM, durumdara wrote:
> I understand it - files are better placed in filesystem - but database
> is protect the operation with transactions in I can keep all data in
> one secure place...

SQLite isn't performing magic. The "secure" part is that it calls fsync()
on the file and parent directory. You can do exactly the same and will have
the same amount of "security".

In any event this is the only way you will get an appreciable improvement in
performance.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwWH6cACgkQmOOfHg372QS53wCdHD+BI7PTyApBZfJKLlPTpdGJ
jc8An14rI8DJ7LPwocTCi8O/ltit7ShF
=s+gW
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages