[sqlite] question about zipvfs & performance

209 views
Skip to first unread message

Tal Tabakman

unread,
Mar 25, 2012, 1:48:51 PM3/25/12
to General Discussion of SQLite Database
Hi,
I am writing an application that performs a lot of DB writes. I am using a
lot of recommended optimizations (like using transactions and more...)
I want to improve my recording time by reducing the amount of I/O. one way
to do so is by compressing the data before dumping it to DISK.
I am evaluating a sqlite extension called zipvfs. this VFS extension
compresses pages before writing them to disk
I am using zlib compress/uncompress as my compression callback functions
for this VFS. I assumed that database writing will be faster with this VFS
since
compression [means less I/O], in reality I see no difference (but the data
is indeed compressed)...
any idea why I don't see any recording time improvement ? is there an
overhead with zipvfs ? any other recommended compression callback
functions ?
cheers
Tal
_______________________________________________
sqlite-users mailing list
sqlite...@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Larry Brasfield

unread,
Mar 25, 2012, 2:41:19 PM3/25/12
to sqlite...@sqlite.org
On March 25, Tal Tabakman wrote:
> Hi,
> I am writing an application that performs a lot of DB writes. I am using a
> lot of recommended optimizations (like using transactions and more...)
> I want to improve my recording time by reducing the amount of I/O. one way
> to do so is by compressing the data before dumping it to DISK.
> I am evaluating a sqlite extension called zipvfs. this VFS extension
> compresses pages before writing them to disk
> I am using zlib compress/uncompress as my compression callback functions
> for this VFS. I assumed that database writing will be faster with this VFS
> since
> compression [means less I/O], in reality I see no difference (but the data
> is indeed compressed)...
> any idea why I don't see any recording time improvement ? is there an
> overhead with zipvfs ? any other recommended compression callback
> functions ?

I suspect you are seeing the effect of SQLite's choice of block size on
disk. If, for some reason, you application would work better with much
larger block sizes than SQLite uses by default, you might see an
improvement with on-the-fly compression/decompression, (provided CPU
time does not take back the I/O time saving). The compression schemes
generally have some overhead, creating and writing out a dictionary in
ZLib's case. You are likely not reducing the number of I/O operations,
but just reducing their size a little. With disk caching and read-ahead
working pretty well for disk sectors that are contiguous, the effect is
bound to be small with block size already resembling the cluster size.

You are tackling a problem that has already been subject to a lot of
optimization effort. Further improvement is likely to be difficult
unless there is something unusual about the access pattern of your
application.

> cheers
> Tal

Good luck.
--
Larry Brasfield

Valentin Davydov

unread,
Mar 25, 2012, 2:43:07 PM3/25/12
to General Discussion of SQLite Database
On Sun, Mar 25, 2012 at 07:48:51PM +0200, Tal Tabakman wrote:
> Hi,
> I am writing an application that performs a lot of DB writes. I am using a
> lot of recommended optimizations (like using transactions and more...)
> I want to improve my recording time by reducing the amount of I/O. one way
> to do so is by compressing the data before dumping it to DISK.
> I am evaluating a sqlite extension called zipvfs. this VFS extension
> compresses pages before writing them to disk
> I am using zlib compress/uncompress as my compression callback functions
> for this VFS. I assumed that database writing will be faster with this VFS
> since
> compression [means less I/O], in reality I see no difference (but the data
> is indeed compressed)...
> any idea why I don't see any recording time improvement ?

Yes. If you are using drive with rortating magnetic plates, then the most
critical stage is seek (latency) time, rather than linear throughput. In
other words, you are limited by a _number_ of random i/o operatitions,
and not by an _amount_ of the information written.

So, if you want to improve your write performance, then your have to
use low-latency storage, such as SSD drive for small databases or
RAID array with plenty ow write cache memory for huge ones.

> is there an overhead with zipvfs ?

You can easily measure this overhead yourself on the in-memory database.
RAM is cheap now ;-)

Valentin Davydov.

Roger Binns

unread,
Mar 25, 2012, 3:09:09 PM3/25/12
to sqlite...@sqlite.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25/03/12 10:48, Tal Tabakman wrote:
> I am evaluating a sqlite extension called zipvfs.

http://www.hwaci.com/sw/sqlite/zipvfs.html for anyone else interested.

> I assumed that database writing will be faster ... in reality I see no
> difference

At the end of a transaction SQLite needs to do fsyncs which ensure that
the data is firmly on the disk. If you are using spinning media this will
require waiting one or more times for the disk to have rotated to a
certain place. For example this is why you'll find you can't do more than
60 transactions per second on a 7200rpm drive.

Since the source for zipvfs isn't public I can't tell the exact inner
working. However it is reasonable to believe that it works on a page by
page basis. If you use larger pages (the default is 1kb, max is 64kb)
then there is more data to compress and more likely a reduction in size.
SQLite's data encoding is quite compact so fairly random data won't
compress that much.

Current hard drives have internal buffers of 16 or 32MB. You'd have to
write a heck of a lot of data to fill that and operating system buffers.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk9vbVAACgkQmOOfHg372QTzMwCcCplmO0R849xTBXrxbydTty15
fxUAn3A+AY+6xPxpaFQ9qvDLezn7gjBU
=lHGv
-----END PGP SIGNATURE-----

Richard Hipp

unread,
Mar 25, 2012, 3:55:23 PM3/25/12
to General Discussion of SQLite Database
On Sun, Mar 25, 2012 at 1:48 PM, Tal Tabakman <tal.ta...@gmail.com>wrote:

> Hi,
> I am writing an application that performs a lot of DB writes. I am using a
> lot of recommended optimizations (like using transactions and more...)
> I want to improve my recording time by reducing the amount of I/O. one way
> to do so is by compressing the data before dumping it to DISK.
> I am evaluating a sqlite extension called zipvfs. this VFS extension
> compresses pages before writing them to disk
>

This seems like a misuse of ZIPVFS. ZIPVFS is designed to be read-mostly.
ZIPVFS trades write performance in exchange for better compression and read
performance.

ZIPVFS has many potential uses, but its design use-case is a multi-gigabyte
map database on a portable GPS navigation device. The database needs to be
compressed in order to fit in available storage. Yet, the database also
needs to be modifiable since maps do sometimes change, though not often nor
by much. In other words, ZIPVFS is designed to be written about as much as
you need to change a map.


> I am using zlib compress/uncompress as my compression callback functions
> for this VFS. I assumed that database writing will be faster with this VFS
> since
> compression [means less I/O], in reality I see no difference (but the data
> is indeed compressed)...
> any idea why I don't see any recording time improvement ? is there an
> overhead with zipvfs ? any other recommended compression callback
> functions ?
> cheers
> Tal
> _______________________________________________
> sqlite-users mailing list
> sqlite...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>

--
D. Richard Hipp
d...@sqlite.org

Reply all
Reply to author
Forward
0 new messages