[sqlite] VACUUMing large DBs

17 views
Skip to first unread message

Udi Karni

unread,
Mar 20, 2012, 4:59:59 PM3/20/12
to General Discussion of SQLite Database
Hello,

I am creating large DBs - each with a single table (magnitude of a few
hundred million rows / 100GB). It takes a few transformations to get to the
final product. When done - I VACUUM the final result.

I am noticing that VACUUM first makes a copy of the data (presumably what
it wants to keep?) into a Temp file. Then it creates a Journal file which I
suspect ends up becoming the final table. Then the Temp and Original go
away - but for a period of time you need 3X the space and you go through 2
complete copies of the data (more or less).

Is there a way to go directory from "original" to "journal/final" -
skipping the creation of the Temp version?

And a more general question. My PC has 8GB of RAM. I am considering getting
a much larger machine that can take upwards of 100-200GB of RAM. It's a
little pricey - so - will SQLite use it advantageously? I wouldn't want to
get it only to find out that anything beyond X GB is a waste of money.
Should a larger machine with multiple CPU and lots of RAM be able to run 2
concurrent SQLites successfuly? Any good recommendations on this?

Thanks !
_______________________________________________
sqlite-users mailing list
sqlite...@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Roger Binns

unread,
Mar 20, 2012, 5:27:39 PM3/20/12
to sqlite...@sqlite.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 20/03/12 13:59, Udi Karni wrote:
> And a more general question. My PC has 8GB of RAM. I am considering
> getting a much larger machine that can take upwards of 100-200GB of
> RAM.

I'd recommend getting one or more SSDs instead (also a lot cheaper). The
reason is that during your vacuum most data is only read once so RAM won't
really help - instead you want improved latency and throughput of reads.

For RAM you should make it a little larger than your working set of data.
That will depend on your workload and app, and applies to any kind of
database. Again SSDs will help since the penalty of a RAM/cache miss is
less than with spinning media.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk9o9ksACgkQmOOfHg372QRIygCgrCnBL5osiqWR+W1bHjLgDwZZ
VAoAoN0gKsJU35myHrlFEerHwLnXjyjY
=yAEn
-----END PGP SIGNATURE-----

Udi Karni

unread,
Mar 20, 2012, 11:08:19 PM3/20/12
to General Discussion of SQLite Database, Roger Binns
Thanks! I got one and tried - and it seems to improve overall performance
about 2X. Very cool.

The 240GB SSD drives are pretty reasonably priced and would suffice for
most tables. I'm just wondering how long before Flash Write Fatigue sets in
and you need a replacement. ????

Jay A. Kreibich

unread,
Mar 20, 2012, 11:25:32 PM3/20/12
to General Discussion of SQLite Database
On Tue, Mar 20, 2012 at 01:59:59PM -0700, Udi Karni scratched on the wall:

> Hello,
>
> I am creating large DBs - each with a single table (magnitude of a few
> hundred million rows / 100GB). It takes a few transformations to get to the
> final product. When done - I VACUUM the final result.
>
> I am noticing that VACUUM first makes a copy of the data (presumably what
> it wants to keep?) into a Temp file. Then it creates a Journal file which I
> suspect ends up becoming the final table. Then the Temp and Original go
> away - but for a period of time you need 3X the space and you go through 2
> complete copies of the data (more or less).
>
> Is there a way to go directory from "original" to "journal/final" -
> skipping the creation of the Temp version?

No, it requires all three copies.

VACUUM has two requirements. First, it must be ACID compliant and
transaction safe. In short, even if the power fails in the middle of
a VACUUM, you must be able to recover the original file, in place.
Second, the VACUUM must result in the original file being updated.
Not just "a file with the same name", but the actual, original file.
Remember that other processes may have the database open during the
VACUUM process, and they must continue to work, using the same active
file handle, when the VACUUM finishes.

This means two copies are required. First, the original database must
be copied to a temp file. During this copy process, the database is
cleaned and defragmented. That temp copy must then be copied back on
top of the original file. That second copy needs to be transaction
safe, however, so we need to journal every change to the original
file, just in case something does wrong. All that adds up to three
copies-- the original, the optimized temp copy, and the journal file
to fix things up should the "copy back" process fail.

Almost exactly two years ago I proposed a "VACUUM TO <filename>"
version of the command that did the first copy and then quit. Rather
than building an optimized temp copy, VACUUM TO would copy the
current database to an optimized named file (rather than a temp file),
and then skip the copy-back stage. This would allow a system admin
to shut down all database users, VACUUM the database, swap files,
and finally restart everything. The process would require more
manual work, but would only require 2x the drive space, rather than
3x. Nobody spoke up about the idea, however.

If you look at the VACUUM code, it isn't black magic. Building an
external tool that did a "VACUUM TO" process wouldn't be that
difficult.


> And a more general question. My PC has 8GB of RAM. I am considering getting
> a much larger machine that can take upwards of 100-200GB of RAM. It's a
> little pricey - so - will SQLite use it advantageously? I wouldn't want to
> get it only to find out that anything beyond X GB is a waste of money.
> Should a larger machine with multiple CPU and lots of RAM be able to run 2
> concurrent SQLites successfuly? Any good recommendations on this?

You'll need a 64-bit version of SQLite.

in the end, however, I'm not sure you'll see much. You can crank up
the page cache size, but that's not going to buy you as much as you
might think-- SQLite frequently flushes the cache is more than one
process is open at the same time. The RAM might work nicely as a
file cache at the OS level, but only for read operations. Writes are
still going to need to go to disk. Unless reads are more frequent
than writes-- by several orders of magnitude-- you are likely to get
better performance by looking at fast storage, rather than lots of RAM.

If you're really looking for that kind of heavy hitting performance,
I might also consider a more traditional client/server RDBMS like
PostgreSQL, or a non-relational DB like Redis.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson

Roger Binns

unread,
Mar 21, 2012, 12:09:17 AM3/21/12
to General Discussion of SQLite Database
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 20/03/12 20:08, Udi Karni wrote:
> Thanks! I got one and tried - and it seems to improve overall
> performance about 2X. Very cool.

Depending on your backups and tolerance for data loss, you can also do
things like RAID 0 striping across multiple drives. The best case
improvement will then be linear with the number of drives although actual
improvement will depend on access patterns.

> I'm just wondering how long before Flash Write Fatigue sets in and you
> need a replacement. ????

This tool will tell you. Sadly it is Windows only. However the data is
gathered from various SMART attributes so in theory it is also possible to
calculate on other platforms.

http://ssd-life.com
http://ssd-life.com/eng/how.html

My workstation's drive is apparently good till 2029.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk9pVG0ACgkQmOOfHg372QSq4gCgnOitqrQD9Aop1Yin+zqypttM
J4cAoMe8ScSMGFd+5pTgdkMQc8sUuxBW
=xyIF

Simon Slavin

unread,
Mar 20, 2012, 11:13:23 PM3/20/12
to General Discussion of SQLite Database

On 21 Mar 2012, at 3:08am, Udi Karni <uka...@gmail.com> wrote:

> The 240GB SSD drives are pretty reasonably priced and would suffice for
> most tables. I'm just wondering how long before Flash Write Fatigue sets in
> and you need a replacement. ????

Nobody knows yet. We've done a ton of tests but so far the drives which have given up are all being judged as defective, just like you get rotating drives which are defective. There's no statistical curve yet for drives of that design failing from use. It certainly does get rid of DRH's point that you can only do one write per sector for each rotation of the drive.

I'm writing this on a laptop which has a 250Gig SSD in, so I have a personal interest. So far the added speed all by itself has made the extra money worthwhile. Not to mention that I can drop the laptop without worrying about damaging the hard drive surface and read/write head.

Simon.

Scott Hess

unread,
Mar 22, 2012, 6:06:09 PM3/22/12
to j...@kreibi.ch, General Discussion of SQLite Database
On Tue, Mar 20, 2012 at 8:25 PM, Jay A. Kreibich <j...@kreibi.ch> wrote:
> On Tue, Mar 20, 2012 at 01:59:59PM -0700, Udi Karni scratched on the wall:
>> Is there a way to go directory from "original" to "journal/final" -
>> skipping the creation of the Temp version?
>
>  No, it requires all three copies.
<...>

>  Almost exactly two years ago I proposed a "VACUUM TO <filename>"
>  version of the command that did the first copy and then quit.  Rather
>  than building an optimized temp copy, VACUUM TO would copy the
>  current database to an optimized named file (rather than a temp file),
>  and then skip the copy-back stage.  This would allow a system admin
>  to shut down all database users, VACUUM the database, swap files,
>  and finally restart everything.  The process would require more
>  manual work, but would only require 2x the drive space, rather than
>  3x.  Nobody spoke up about the idea, however.

I think you could manage 2x-the-drive-space without shutdown by
writing a read-only VFS which treated the pages in the journal as its
backing store, faulting missed through to the main file. Then you
could VACUUM from the database-in-the-journal to the
database-in-the-database. In case of failure, the journal rolls
things back like you'd expect.

I _think_ this would work.

-scott

Udi Karni

unread,
Mar 22, 2012, 6:19:46 PM3/22/12
to General Discussion of SQLite Database
For the time being - I have been avoiding the VACUUM of very large DBs by
creating a new iteration of the table/DB for each transformation instead of
using UPDATE/DELETE (given that I only have 1 table per DB) -

(1) create new DB_V2 / Table_V2
(2) attach DB_V1 / Table_V1
(3) insert into Table_V2 select (column list with transformations) from
Table_V1
(4) drop DB_V1

If there are too many transformations - I just do it in a few iterations.

By using 2 seperate disks for the 2 DBs/tables - one only reads - the other
only writes - rows don't grow/shrink - and especially if both disks are
SSDs - this works quite fast and no need to VACUUM the final version.

What would make it even better would be the possibility of using CREATE
TABLE X NOLOGGING AS SELECT ... FROM Y;

Is there a way to run NOLOGGING in SQlite syntax - which means that if
something in the destination table/DB fails - you are prepared to just drop
it and start over?

Thanks !

Petite Abeille

unread,
Mar 22, 2012, 6:25:32 PM3/22/12
to General Discussion of SQLite Database

On Mar 22, 2012, at 11:19 PM, Udi Karni wrote:

> Is there a way to run NOLOGGING in SQlite syntax - which means that if
> something in the destination table/DB fails - you are prepared to just drop
> it and start over?

PRAGMA journal_mode=off

http://sqlite.org/pragma.html#pragma_journal_mode

Udi Karni

unread,
Mar 22, 2012, 6:29:13 PM3/22/12
to General Discussion of SQLite Database
Very nice! Thanks !

But then - can you turn journaling off and then run a VACUUM and have it
run as a 2-step instead of a 3-step?

Reply all
Reply to author
Forward
0 new messages