regarding PRAGMA journal_mode;
Should =WAL affects only current session, or all following sessions until
=DELETE performed? As long as I see it keeps the state, I concluded it from
the following facts
- prior versions of sqlite refuse to accept the db (as the doc says),
- shm, wal files exists
- 18, 19 offset numbers are 2.
But at the same time PRAGMA journal_mode; reported "wal" when wal was
changed (it's ok) and "delete" after reopening the db (for me it is
unexpected behavior). I used sqlite-snapshot-201007060929.zip
- About vfs. Is it ok to implement version 2 of the interfaces and supply it
to older versions? So if the structures contain iVersion=2, will all prior
versions of sqlite3 accept it?
Thanks,
Max,
maxerist.net
_______________________________________________
sqlite-users mailing list
sqlite...@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
<http://sqlite.org/draft/wal.html>"An SQLite database _connection_ defaults
to journal_mode=DELETE. To convert to WAL mode, use the following
pragma: PRAGMA journal_mode=WAL;"
--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
I read this sentence, but it didn't help.
So I suppose there's a bug in PRAGMA journal_mode logic
Steps to reproduce.
1. Create an empty base with some table.
Look at the 18,19 offsets, they both = 1, it's ok, the base is compatible
with full range of sqlite3 versions.
2. Do PRAGMA journal_mode=WAL; , it returns wal, so WAL is set.
Look at the 18,19 offsets, they both = 2, the base no longer compatible with
older versions, checking... yes, they say "encrypted" or something.
Query PRAGMA journal_mode; alone (just to check not set) , it still says
"wal", ok
3. Close the database. offsets 18,19 still = 2, so the WAL setting are kept
between sessions.
4. Open the db again, offsets 18,19 still = 2,
query PRAGMA journal_mode; it says "delete", but definetely should return
"wal".
Max
$ sqlite3 grow.db 'pragma journal_mode'
delete
$ hexdump -s 17 -n 2 grow.db | head -n1
0000011 0200
$ sqlite3 grow.db 'pragma journal_mode=delete'
delete
$ hexdump -s 17 -n 2 grow.db | head -n1
0000011 0100
> Alexey,
>
> I read this sentence, but it didn't help.
> So I suppose there's a bug in PRAGMA journal_mode logic
> Steps to reproduce.
>
> 1. Create an empty base with some table.
> Look at the 18,19 offsets, they both = 1, it's ok, the base is compatible
> with full range of sqlite3 versions.
>
> 2. Do PRAGMA journal_mode=WAL; , it returns wal, so WAL is set.
> Look at the 18,19 offsets, they both = 2, the base no longer compatible
> with
> older versions, checking... yes, they say "encrypted" or something.
> Query PRAGMA journal_mode; alone (just to check not set) , it still says
> "wal", ok
>
> 3. Close the database. offsets 18,19 still = 2, so the WAL setting are kept
> between sessions.
>
> 4. Open the db again, offsets 18,19 still = 2,
> query PRAGMA journal_mode; it says "delete", but definetely should return
> "wal".
>
You want "PRAGMA main.journal_mode"
A "PRAGMA journal_mode;" (without the "main.") shows you the default journal
mode used by newly created databases, which is always "DELETE" unless you
have changed it with a prior "PRAGMA journal_mode=MODE" command.
Though, I will admit, this is confusing, and I was thinking last night about
ways we could possibly change it....
>
> Max
>
>
> On Thu, Jul 8, 2010 at 12:09 AM, Alexey Pechnikov <pech...@mobigroup.ru
> >wrote:
>
> > See http://sqlite.org/draft/wal.html :
> >
> > <http://sqlite.org/draft/wal.html>"An SQLite database _connection_
> > defaults
> > to journal_mode=DELETE. To convert to WAL mode, use the following
> > pragma: PRAGMA journal_mode=WAL;"
> >
> > --
> > Best regards, Alexey Pechnikov.
> > http://pechnikov.tel/
> > _______________________________________________
> > sqlite-users mailing list
> > sqlite...@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> _______________________________________________
> sqlite-users mailing list
> sqlite...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
--
---------------------
D. Richard Hipp
d...@sqlite.org
Thanks for pointing out. I have two observations
- According to docs this was present for ages so it's a shame on me not to
knowing it in the first place. So probably it should stay as it is now.
- But this is the first time when this important pragma really affects the
sqlite file format and what it more important, not only for current session.
You probably had reasons for implementing WAL the way it is now, but
imagine, this setting once changed don't remind of itself for the developer.
So all his current code base works once it started using this version of
sqlite, but consequently small (or maybe large part) of his bases becomes
WAL-enabled (number 2 in the file format). The latter may appear because of
his own WAL-On without WAL-Off or WAL-On with unexpected program
interruption. Everything is ok, until these bases have to be
distributed/used in other enviroments, that probably use lower than 3.7
versions of sqlite3, so it stops working claming about unknown file format.
I think it's a potential way of new wave of mass bug/missing reporting or
simply confusion
I may exaggerate, I suggest other participants of the list share their
thoughts
Thanks,
Max
> So all his current code base works once it started using this version of
> sqlite, but consequently small (or maybe large part) of his bases becomes
> WAL-enabled (number 2 in the file format). The latter may appear because of
> his own WAL-On without WAL-Off or WAL-On with unexpected program
> interruption. Everything is ok, until these bases have to be
> distributed/used in other enviroments, that probably use lower than 3.7
> versions of sqlite3, so it stops working claming about unknown file format.
>
Correct. Older versions of SQLite cannot read or write a database file that
has a non-empty WAL, since older versions of SQLite do not know what to do
with the WAL. There is nothing really we can do about this, other than not
provide WAL as an option.
--
---------------------
D. Richard Hipp
d...@sqlite.org
> Though, I will admit, this is confusing, and I was thinking last night
> about ways we could possibly change it....
>
Current behavior:
(1) PRAGMA name.journal_mode=MODE; -- set the mode to MODE for database
"name".
(2) PRAGMA name.journal_mode; -- return current journal mode for database
"name".
(3) PRAGMA journal_mode=MODE; -- set the mode to MODE for all attached
databases and also apply MODE to all databases ATTACH-ed in the future.
(4) PRAGMA journal_mode; -- report the most recent setting by (3). Report
"delete" if there have been no prior occurences of (3).
First Proposed Change:
(1) and (2) are the same.
(3) PRAGMA journal_mode=MODE; -- set the mode to MODE for all attached
databases. Databases created by subsequent ATTACH use MODE. Existing
databases added by ATTACH use MODE if and only if doing so does not require
them to change in or out of WAL mode.
(4) PRAGMA journal_mode; -- report the most recent setting by (3). Or if
(3) has never been used, work the same as "PRAGMA main.journal_mode;"
Second Proposed Change:
Remove the WAL mode from "PRAGMA journal_mode". The journal_mode pragma
only specifies the various rollback journal modes. Enable the WAL using a
separate pragma such as "PRAGMA wal=ON; PRAGMA wal=OFF;"
Thoughts? Comments? Other suggestions?
>
> Remove the WAL mode from "PRAGMA journal_mode". The journal_mode pragma
> only specifies the various rollback journal modes. Enable the WAL using a
> separate pragma such as "PRAGMA wal=ON; PRAGMA wal=OFF;"
>
> Thoughts? Comments? Other suggestions?
>
>
Maybe it's not the right time, but can it be implemented with separated
pragma as your second proposed change but with more options like OFF,
PERSISTENT, TEMPORAL. Still thinking about the fact that WAL being a new
software feature and actually becomes a new file format change, the latter
option can solve this (if it's technically possible and not hard to append
of course). The logic can be like this, if WAL=TEMPORAL, the first
reader/writer that accesses the base makes the changes to 18,19 bytes (=2)
allowing it and following reader/writers work in WAL mode and the last
reader/writer that closes the base, reverts it back (=1) allowing the file
format stays the same. Sure there are possible cases when the format still
changed (for example unexpected program end), but the next successful
open/close will fix this.
Max
It's more clean I think. With wal=on and journal_mode=delete SQLite may
delete WAL journal before the last connection is closed and so provide
backward compability (of cource, only new SQLite versions can restore the
crashed databases). With wal=on and journal_mode=persist SQLite may use
persistent WAL journal without backward compability reasons.
--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
> >
> > Remove the WAL mode from "PRAGMA journal_mode". The journal_mode pragma
> > only specifies the various rollback journal modes. Enable the WAL using
> a
> > separate pragma such as "PRAGMA wal=ON; PRAGMA wal=OFF;"
> >
>
> It's more clean I think. With wal=on and journal_mode=delete SQLite may
> delete WAL journal before the last connection is closed and so provide
> backward compability (of cource, only new SQLite versions can restore the
> crashed databases). With wal=on and journal_mode=persist SQLite may use
> persistent WAL journal without backward compability reasons.
>
In the current implementation, if you call "PRAGMA wal_checkpoint" just
prior to closing the database, the WAL file will be deleted automatically.
But it keeps the database in WAL mode, so the WAL is recreated the next time
you open and write to the database.
--
---------------------
D. Richard Hipp
d...@sqlite.org
> In the current implementation, if you call "PRAGMA wal_checkpoint" just
> prior to closing the database, the WAL file will be deleted automatically.
> But it keeps the database in WAL mode, so the WAL is recreated the next
> time
> you open and write to the database.
It's not helpful for backward compability. How about version downgrade of
the Android or some other mobile OS and as result impossibility to open any
SQLite database?..
--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
> 2010/7/8 Richard Hipp <d...@sqlite.org>
>
> > In the current implementation, if you call "PRAGMA wal_checkpoint" just
> > prior to closing the database, the WAL file will be deleted
> automatically.
> > But it keeps the database in WAL mode, so the WAL is recreated the next
> > time
> > you open and write to the database.
>
>
> It's not helpful for backward compability. How about version downgrade of
> the Android or some other mobile OS and as result impossibility to open any
> SQLite database?..
>
Presumably the version downgrade comes with a script. (Otherwise, other
changes such as schema changes would go uncorrected.) That script merely
has to run
PRAGMA journal_mode=DELETE;
on every database. Problem solved.
>
> --
> Best regards, Alexey Pechnikov.
> http://pechnikov.tel/
> _______________________________________________
> sqlite-users mailing list
> sqlite...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
--
---------------------
D. Richard Hipp
d...@sqlite.org
That's not backwards compatibility (newer versions working with items
from older environments), that's forwards compatibility (older versions
working new items from a newer environment).
It is no different than a new(er) application that uses newer APIs
not working on an older version of the OS. If you want the ability
to do this, don't use new features (or turn it off every time you
close the database).
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
> > It's not helpful for backward compability. How about version downgrade of
> > the Android or some other mobile OS and as result impossibility to open
> any
> > SQLite database?..
>
> That's not backwards compatibility (newer versions working with items
> from older environments), that's forwards compatibility (older versions
> working new items from a newer environment).
>
I did speak about "Backwards Compatibility" chapter from
http://www.sqlite.org/draft/wal.html
It is no different than a new(er) application that uses newer APIs
> not working on an older version of the OS.
And how many such changes in POSIX file API?.. I think SQLite API as
applications file format is similar to POSIX file API.
--
Best regards, Alexey Pechnikov.
http://pechnikov.tel/
> (4) PRAGMA journal_mode; -- report the most recent setting by (3). Or if
> (3) has never been used, work the same as "PRAGMA main.journal_mode;"
Contexts are generally bad. This is even more true if there might
ever be a way to change the compile time default journal mode to
anything other than "delete." Then there is no way to get the default.
There are a few pragmas that use this general approach... where
getting/setting an unqualified setting changes the default, not
"main". It might be better if all unqualified statements refer
to main, and pragmas like this have a default.journal_mode (get/set)
or all.journal_mode (set) syntax.
> Second Proposed Change:
>
> Remove the WAL mode from "PRAGMA journal_mode". The journal_mode pragma
> only specifies the various rollback journal modes. Enable the WAL using a
> separate pragma such as "PRAGMA wal=ON; PRAGMA wal=OFF;"
>
> Thoughts? Comments?
I like this. I suspect in the future we will have a "PRAGMA
wal_mode" as well, so this makes a bit more sense.
But as an extension, does it make even more sense to have "PRAGMA
wal=[on|off]" or "PRAGMA transaction=[wal|journal]" ?
> Other suggestions?
This has nothing to do with WAL, but it might be nice to expose
the logic that does SQL-type => SQLite-affinity mappings
(i.e. sqlite3AffinityType()):
int sqlite3_get_affinity( const char *type );
Takes a string that contains an SQL type. Returns one of:
SQLITE_AFF_TEXT, SQLITE_AFF_NONE, SQLITE_AFF_NUMERIC,
SQLITE_AFF_INTEGER, SQLITE_AFF_REAL.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
Which pretty much says "if you turn this on, the file will not work
with older versions until you turn it off." What's the issue?
> > It is no different than a new(er) application that uses newer APIs
> > not working on an older version of the OS.
>
> And how many such changes in POSIX file API?.. I think SQLite API as
> applications file format is similar to POSIX file API.
Which is why you're using bleeding-edge features in an
unreleased development version?
If it is just a file API, use the system libs in their default
configuration and be done with it.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
> > First Proposed Change:
> >
> > (1) and (2) are the same.
> >
> > (3) PRAGMA journal_mode=MODE; -- set the mode to MODE for all attached
> > databases. Databases created by subsequent ATTACH use MODE. Existing
> > databases added by ATTACH use MODE if and only if doing so does not
> require
> > them to change in or out of WAL mode.
>
> > (4) PRAGMA journal_mode; -- report the most recent setting by (3). Or
> if
> > (3) has never been used, work the same as "PRAGMA main.journal_mode;"
>
> >
> Contexts are generally bad. This is even more true if there might
> ever be a way to change the compile time default journal mode to
> anything other than "delete." Then there is no way to get the default.
>
Agreed.
Since sending the prior email, internal discussion has moved us in the
direction of eliminating the "default journal mode" all together. This
makes both the code and documentation smaller, which implies that it will
also make things easier to understand.
So probably we'll end up with something like this:
PRAGMA name.journal_mode=MODE; -- set the mode of database "name" to MODE.
PRAGMA name.journal_mode; -- report the mode of database "name".
PRAGMA journal_mode; -- an alias for "PRAGMA main.journal_mode"
PRAGMA journal_mode=MODE; -- set the mode of all currently ATTACHed
databases.
When new databases are ATTACH-ed, they come up in either DELETE or WAL mode
(depending on how they were last closed) and must be changed to whatever the
application desires using a new PRAGMA journal_mode call. No more trying to
guess what the application wants and apply that as a default.
This is ever-so-slightly incompatible with prior SQLite versions, but on the
other hand, we never have promised to maintain PRAGMA compatibility, and if
we are going to change something, it seems like 3.6->3.7 is a good time to
change it.
--
---------------------
D. Richard Hipp
d...@sqlite.org
>
>
> PRAGMA name.journal_mode=MODE; -- set the mode of database "name" to MODE.
> PRAGMA name.journal_mode; -- report the mode of database "name".
> PRAGMA journal_mode; -- an alias for "PRAGMA main.journal_mode"
> PRAGMA journal_mode=MODE; -- set the mode of all currently ATTACHed
> databases.
>
The latest version 3.7.0 pre-release snapshot does journal_mode as described
above.
http://www.sqlite.org/draft/download.html to get a copy for testing. The
documentation at
http://www.sqlite.org/draft/pragma.html#pragma_journal_modehas been
updated as well. Please report any problems to this list. Thanks!
I have a query which is very slow, and was wondering if there was any
advice you guys had on it.
Here are two table definitions:
CREATE TABLE IF NOT EXISTS globalRankingTable (id INTEGER PRIMARY KEY
AUTOINCREMENT, name TEXT NOT NULL UNIQUE, ranking TEXT, score REAL,
record TEXT);
CREATE TABLE IF NOT EXISTS friendTable (id INTEGER PRIMARY KEY
AUTOINCREMENT, simpleId TEXT NOT NULL UNIQUE, player TEXT, friend TEXT);
And here is my query (written in a script language):
db.query("SELECT * FROM (SELECT DISTINCT * FROM globalRankingTable WHERE
upper(name) = upper('?') OR id < ? union all SELECT a.* FROM
globalRankingTable a, friendTable b WHERE upper(b.player) = upper('?')
AND upper(b.friend) = upper(a.name)) ORDER BY score DESC", 0,
%client.username, %globId, %client.username);
The intention of this query is to, basically:
- get the top 10 scores in rankingTable
- get "my" score in rankingTable
- get all of my friends' scores in rankingTable
The query works, but is very slow. This may just be because it's a
complex query, but all feedback would be much appreciated.
For your extra information:
- globalRankingTable has several thousand rows
- friendTable has several thousand rows also, but each user will only
have 0-10 friends (usually on the low part of that scale)
- yes I know I'm an idiot with the upper stuff, a refactor is needed on
that - I know all of my "name" fields should really be integers.
Thanks,
Ian
Create an index either on player column or friend column in your second table.
CREATE INDEX i_friendTable ON friendTable (player ASC);
--
Benoit Mortgat
20, avenue Marcel Paul
69200 Vénissieux, France
+33 6 17 15 41 58
+33 4 27 11 61 23
Table PEOPLERANKINGS(personid INTEGER PRIMARY KEY, personname TEXT, ranking)
Table FRIENDS(personid1, personid2)
and an index on
PEOPLERANKINGS.ranking
and FRIENDS.personid1,FRIENDS.personid2 is a composite unique primary key
You could get the top 10 ranked people
select * from PEOPLERANKINGS order by ranking desc limit 10
and get your own ranking and the ranking of your friends:
select peoplerankings.* from PEOPLERANKINGS
where personid IN
(select personid2 from FRIENDS where personid1 = ?yourId? )
NOTE: befriend yourself by default in the FRIENDS table.
Regards
Tim Romano
Swarthmore PA
I have sqlite db name "wdb" and "wdb-journal" file was created by power
failure something , when I do any db operation sqlite always prompt "disk
I/O error" , but when I delete the "wdb-journal" ,there is no errors
prompted. I think maybe the wdb-journal file was corrupted , does anyone
have any idea on this ?
I used sqlite on linux system.
Best Regards,
Ke Tao
>
> HI All ,
>
> I have sqlite db name "wdb" and "wdb-journal" file was created by power
> failure something , when I do any db operation sqlite always prompt "disk
> I/O error" , but when I delete the "wdb-journal" ,there is no errors
> prompted. I think maybe the wdb-journal file was corrupted , does anyone
> have any idea on this ?
> I used sqlite on linux system.
>
> Best Regards,
> Ke Tao
>
Permission problems can cause this. The "disk I/O error" actually means "I
can't do a rollback", maybe because root created the journal and owns the
database, then another user is trying to do things with it. Just a guess.
This "disk I/O error" message is very confusing. Been there!
Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
> On Fri, Jul 9, 2010 at 10:38 AM, Ke Tao <kot...@gmail.com> wrote:
>
>>
>> HI All ,
>>
>> I have sqlite db name "wdb" and "wdb-journal" file was created by power
>> failure something , when I do any db operation sqlite always prompt "disk
>> I/O error" , but when I delete the "wdb-journal" ,there is no errors
>> prompted. I think maybe the wdb-journal file was corrupted , does anyone
>> have any idea on this ?
>> I used sqlite on linux system.
>>
>> Best Regards,
>> Ke Tao
>>
>
> Permission problems can cause this. The "disk I/O error" actually means "I
> can't do a rollback", maybe because root created the journal and owns the
> database, then another user is trying to do things with it. Just a guess.
> This "disk I/O error" message is very confusing. Been there!
>
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://sites.google.com/site/hashbackup
>
Also, by deleting the journal, you are likely to corrupt your database. You
can't delete the journal!
Jim
I have a query which is very slow, and was wondering if there was any
advice you guys had on it.
Here are two table definitions:
CREATE TABLE IF NOT EXISTS globalRankingTable (id INTEGER PRIMARY KEY
AUTOINCREMENT, name TEXT NOT NULL UNIQUE, ranking TEXT, score REAL,
record TEXT);
CREATE TABLE IF NOT EXISTS friendTable (id INTEGER PRIMARY KEY
AUTOINCREMENT, simpleId TEXT NOT NULL UNIQUE, player TEXT, friend TEXT);
And here is my query (written in a script language):
db.query("SELECT * FROM (SELECT DISTINCT * FROM globalRankingTable WHERE
upper(name) = upper('?') OR id < ? union all SELECT a.* FROM
globalRankingTable a, friendTable b WHERE upper(b.player) = upper('?')
AND upper(b.friend) = upper(a.name)) ORDER BY score DESC", 0,
%client.username, %globId, %client.username);
The intention of this query is to, basically:
- get the top 10 scores in rankingTable
- get "my" score in rankingTable
- get all of my friends' scores in rankingTable
The query works, but is very slow. This may just be because it's a
complex query, but all feedback would be much appreciated.
For your extra information:
- globalRankingTable has several thousand rows
- friendTable has several thousand rows also, but each user will only
have 0-10 friends (usually on the low part of that scale)
- yes I know I'm an idiot with the upper stuff, a refactor is needed on
that - I know all of my "name" fields should really be integers.
Thanks,
Ian
On 07/09/2010 01:54 AM, Ian Hardingham wrote:
> And here is my query (written in a script language):
Are you actually getting correct answers from this?
> upper('?')
Unless your script language includes a SQL parser, that does not do what you
think it does. It is seeing if the value matches upper cased question mark.
It looks like you wanted to supply bound parameters so the question mark
should not be in quotes.
I'd also suggest using a better wrapper or scripting language since you are
providing three bindings but the query is only using one of them. It should
complain.
> - yes I know I'm an idiot with the upper stuff, a refactor is needed on
> that - I know all of my "name" fields should really be integers.
Or look into indices and collations.
In any event you should always post your exact actual query so we don't have
to figure out issues like the question marks are intentional or
transcription errors.
Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkw42kcACgkQmOOfHg372QSgxQCfTtgfYFMZ9a4nHtbJn8gI7XEK
7wYAn3ghSgc3wPgl0bFGLq8oA2moTJcs
=aCUo
-----END PGP SIGNATURE-----
SQlite command line app has a command that does this (does not remember if
it´s this:
".explain ON|OFF Turn output mode suitable for EXPLAIN on or off.\n")
'ANALYZE' SQlite command also performs some kind of prefetching of indexes
and loads results in a internal
table, can help also to speed up.
Also, try to avoid JOINS like this
'SELECT a.* FROM globalRankingTable a, friendTable b WHERE upper(b.player) =
upper('?') AND upper(b.friend) =
upper(a.name))'
It´s better to explicit use LEFT JOIN / INNER JOIN always.
I've been playing around with sqlite.
Very impressed so far.
Using 3.5.6 in windows developing with Clarion.
My question(s)
If I want to use the "insert or replace" to populate my database of
around 1 million records.
And I want to do it as fast as possible.
What are all the tricks I can use?
I've read http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html
And looked at some of the various pragmas.
But would PRAGMA locking_mode = EXCLUSIVE for instance give me an extra
speed increase?
Do we have a page in www.sqlite.org dedicated to speed optimization?
I currently use the following pragma's (for speed)
temp_store = 2
page_size=4096
cache_size=8192
synchronous=off
Any others I could try?
I also wrap my statements (about 500 inserts at a time) with a begin/end
transaction.
After these 500 i take a few seconds to read more data so sqlite should
have time to do any housekeeping it might need.
I had a count(*) to check how many inserts was actually done(4 progress
bar) - and this slowed my down very much.
Took it out, and want to use "select total_changes() " to keep track of
inserts.
Any problem with that?
This is my 1st post here and I have a feeling I'm asking to many
questions at once? :-)
Cheers,
Werner
DISCLAIMER:
Everything in this email and its attachments relating to the official business of Face Technologies is proprietary to Face Technologies. It is confidential, legally privileged and protected by law. The person addressed in the email is the sole authorised recipient. Any unauthorized dissemination or copying of this e-mail (or any attachment to this e-mail) or the wrongful disclosure of the information herein contained is prohibited.
Wrap more into a transaction. 500 is too small of a percentage of a million.
John
SQLite is not a Database Server. It has no background threads. So it
can't do any "housekeeping" until you call some sqlite3_* function.
Pavel
-----Original Message-----
From: sqlite-use...@sqlite.org [mailto:sqlite-use...@sqlite.org] On Behalf Of John Drescher
Sent: Tuesday, July 13, 2010 12:37 PM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] Sqlite Insert Speed Optimization
> I also wrap my statements (about 500 inserts at a time) with a
> begin/end transaction.
> After these 500 i take a few seconds to read more data so sqlite
> should have time to do any housekeeping it might need.
>
Wrap more into a transaction. 500 is too small of a percentage of a million.
John
====================
John,
I was wondering if that's really so. Wouldn't the marginal speed improvement be quite small? Is the percentage of the final rowcount really a criterion?
Each transaction costs at least 1 disk seek. Doing thousands of seeks
the result would be very slow.
John
I guess synchronous=off eliminates this flushing / seeking though.
> I currently use the following pragma's (for speed)
> temp_store = 2
> page_size=4096
> cache_size=8192
> synchronous=off
>
> Any others I could try?
Don't get too involved in the PRAGMAs until you have a good reason to. The default values are pretty good. How much slower is it than you want it to be ? If you need, say, 5% improvement we might suggest some things, but if you need 50% improvement we might suggest more radical (and harder to program) solutions. If you're optimizing just for the sake of it, find something better to do.
If you're doing a huge amount of database loading first it's faster to do it before you create any INDEXes, then to create your INDEXes once your TABLEs are populated. Once your database is initialised do you expect to do more reads or more writes ? Which one you do more of suggests how many INDEXes you should define.
> I also wrap my statements (about 500 inserts at a time) with a begin/end
> transaction.
As JD wrote, at 500 writes in a transaction you're wasting a lot of time in overheads. Try 50,000.
> After these 500 i take a few seconds to read more data so sqlite should
> have time to do any housekeeping it might need.
Unlike, for example MySQL, SQLite does nothing in the background. The only functions it runs are the ones you call directly: it has no server process and no daemon. However, your hardware may be caching writes or something, so your hardware may be taking advantage of the pauses.
> I had a count(*) to check how many inserts was actually done(4 progress
> bar) - and this slowed my down very much.
> Took it out, and want to use "select total_changes() " to keep track of
> inserts.
Much better. As an alternative (and I'm not saying it's better than what you already have) take a look at
http://www.sqlite.org/c3ref/last_insert_rowid.html
Simon.
> Is the percentage of the final rowcount really a criterion?
The answer to that, according to my brief exploration, is somewhere
between "yes" and "very much yes", depending on various factors.
--
Eric A. Smith
The number of UNIX installations has grown to 10, with more expected.
-- The Unix Programmer's Manual, 2nd Edition, June 1972
> My question(s)
> If I want to use the "insert or replace" to populate my database of
> around 1 million records.
> And I want to do it as fast as possible.
> What are all the tricks I can use?
Obey the first rule of optimization: don't do it unless you're
sure you need to.
If you're sure you need to, here are a few hints from my own
experience. They all come with trade-offs.
Buy faster hardware with more memory. I am serious. Sometimes it is
much cheaper to wait for 6 months for CPU speeds and memory sizes to
increase than to spend time optimizing your code.
Don't define triggers, indices, unique constraints, check constraints,
or primary keys. Make sure foreign key checking is off (which it is by
default).
Give sqlite a large page cache. If you are 32 bits then sqlite can't use
more than 4Gb of memory, so keep that in mind. If you give sqlite more
memory than exists on your machine, you might go to swap hell, so don't
do that.
If you are CPU bound and if you can split your problem into orthogonal
chunks and if you have multiple CPUs, consider farming out the work to
worker processes and incorporating their results into the main database.
Depending on your use-case you can consider telling your operating
system to favor the disk cache over processes' memory when you are
running low on RAM. In linux this is accomplished by setting 'swappiness'
high, not sure about other OSs.
> I had a count(*) to check how many inserts was actually done(4 progress
> bar) - and this slowed my down very much.
That's because count(*) doesn't run in constant time. I'm not sure, but
it may be linear. Which would imply that your algo as a whole is
quadratic instead of its original (likely constant) asymptotic behavior.
> Took it out, and want to use "select total_changes() " to keep track of
> inserts. Any problem with that?
You sound like you are writing a multi-threaded program. Are you sure
that total_changes() is only counting changes due to your insertions?
Consider keeping a loop execution counter and using that for your status
bar.
Good luck,
Eric
--
Eric A. Smith
Electricity is actually made up of extremely tiny particles
called electrons, that you cannot see with the naked eye unless
you have been drinking.
-- Dave Barry
Griggs, Donald wrote:
> Is the percentage of the final rowcount really a criterion?
The answer to that, according to my brief exploration, is somewhere between "yes" and "very much yes", depending on various factors.
Thanks, Eric. I guess I was wondering if the fastest records-per-transaction value would depend on the page cache and be more or less independent of the total records to be imported. (Indicies omitted.)
So, the records-per-transaction for import to a 20 million row table should be twenty times the size for a 1 million row table?
I confess I've got a lot to learn.
> I guess I was wondering if the fastest records-per-transaction value
> would depend on the page cache and be more or less independent of the
> total records to be imported.
I think the page cache is one of a great many variables.
> So, the records-per-transaction for import to a 20 million row table
> should be twenty times the size for a 1 million row table?
I'm no sqlite or sql guru myself, so with a grain of salt:
If you have no reason to commit in the middle of a batch, then don't
do it. I think inserting all the rows in a single go will give you the
best insert performance in most use cases.
The idea is that there is some fixed overhead (call it O) that SQLite
has to go through every time it commits a transaction. The overhead is
'fixed' because it is independent of the number of rows you inserted.
If you insert 1m rows and commit every 500, the total commit overhead is
2000*O. If you commit just once, the total commit overhead is just O.
This argument is likely a small or big lie for a number of reasons, but
is at least a push in the right direction.
Eric
--
Eric A. Smith
The problem with engineers is that they tend to cheat in order to get results.
The problem with mathematicians is that they tend to work on toy problems
in order to get results.
The problem with program verifiers is that they tend to cheat at toy problems
in order to get results.
> I also wrap my statements (about 500 inserts at a time) with a
> begin/end transaction.
> After these 500 i take a few seconds to read more data so sqlite
> should have time to do any housekeeping it might need.
>
Wrap more into a transaction. 500 is too small of a percentage of a
million.
John
====================
John,
I was wondering if that's really so. Wouldn't the marginal speed
improvement be quite small? Is the percentage of the final rowcount
really a criterion?
====================
I did find that the speed improvement was quite small, and to the user
it appears as if the machine is unresponsive.
DISCLAIMER:
Everything in this email and its attachments relating to the official business of Face Technologies is proprietary to Face Technologies. It is confidential, legally privileged and protected by law. The person addressed in the email is the sole authorised recipient. Any unauthorized dissemination or copying of this e-mail (or any attachment to this e-mail) or the wrongful disclosure of the information herein contained is prohibited.
_______________________________________________
It helped a lot.
1. I'm going to try and see what happen if I leave the "end transaction"
until 50000 insert was done.
2. I'm going to increase cache_size from 8192 to 16384
Extra info,
1. This program saved data to a clarion file before and in sqlite it's
about 20-30% slower.
2. I'm reading data from an remote oracle server
3. I'm saving to a network drive.
4. If my program seems to hang for longer than about 3-5 minutes the
user tends to kill it.
ps. When I started with sqlite it took 500 minutes to save the 1 million
records.
I've got it down to just less than 200 minutes with current settings.
Clarion does it in between 100 and 200 minutes.
This is what I meant also when I said 500 was too small.
John
> 3. I'm saving to a network drive.
Is this a one-off data import into a new clean sqlite db? If so have
you considered writing to a db file on a local drive and then copying
the whole file to the network drive afterwards?
Paul.
On 14 July 2010 11:56, Werner Smit wrote:
> 3. I'm saving to a network drive.
Is this a one-off data import into a new clean sqlite db? If so have
you considered writing to a db file on a local drive and then copying
the whole file to the network drive afterwards?
Paul.
I have considered that!
I do allow the users to do a clean build or an update.
Clean build is usually once a week - I could build that on local drive
BUT the problem is if the copy fail and leave them with incomplete file
I'm in trouble.
To manage this and make sure all is well is not an easy task.
I did check and the sqlite speed on local drive seem to be much faster.
Would it be possible to tell sqlite to save journal file to local drive
and merge with server file
on transaction completion?
Another option - with much more work would be to create a db on local
drive,
fill it with X records and start a seperate thread to merge this with
network drive while my program is busy fetching the next X record batch.
But if at all possible I'd like to not make too many changes to the
current program.
I like the kiss scenario. "keep it short and simple"
DISCLAIMER:
Everything in this email and its attachments relating to the official business of Face Technologies is proprietary to Face Technologies. It is confidential, legally privileged and protected by law. The person addressed in the email is the sole authorised recipient. Any unauthorized dissemination or copying of this e-mail (or any attachment to this e-mail) or the wrongful disclosure of the information herein contained is prohibited.
_______________________________________________
________________________________
> ps. When I started with sqlite it took 500 minutes to save the 1 million
> records.
> I've got it down to just less than 200 minutes with current settings.
> Clarion does it in between 100 and 200 minutes.
Do you have any indexes defined ? It can be considerably faster to DROP the indexes before importing all that data, then remake them afterwards. Of course you may already be doing this.
Simon.
>> ps. When I started with sqlite it took 500 minutes to save the 1
million
>> records.
>> I've got it down to just less than 200 minutes with current settings.
>> Clarion does it in between 100 and 200 minutes.
>Do you have any indexes defined ? It can be considerably faster to
DROP the indexes before importing all that >data, then remake them
afterwards. Of course you may already be doing this.
>
>Simon.
Nope, no index only a primary key..
CREATE TABLE Due
(Pin Integer
,IDNumber Char(13)
,Name VarChar(35)
,PayPeriod Integer
,PaypointCode VarChar(6)
,RegionCode VarChar(6)
,ProxyPin Integer
,PRIMARY KEY (Pin)
);
The table I'm using for test purpose look like above.
DISCLAIMER:
Everything in this email and its attachments relating to the official business of Face Technologies is proprietary to Face Technologies. It is confidential, legally privileged and protected by law. The person addressed in the email is the sole authorised recipient. Any unauthorized dissemination or copying of this e-mail (or any attachment to this e-mail) or the wrongful disclosure of the information herein contained is prohibited.
_______________________________________________
________________________________
From: sqlite-use...@sqlite.org on behalf of Werner Smit
Sent: Wed 7/14/2010 10:06 AM
To: General Discussion of SQLite Database
It DOES sound terrible since 90%? of the time is spend in retrieving
data from a remote oracle server over a slow line. The problem is that
the time spend saving to sqlite is still more than I used to spend on
saving to my previous file system.
________________________________
From: sqlite-use...@sqlite.org on behalf of Werner Smit
Sent: Wed 7/14/2010 10:59 AM
To: General Discussion of SQLite Database
> It DOES sound terrible since 90%? of the time is spend in retrieving
> data from a remote oracle server over a slow line.
I think you're trying to optimise the wrong thing :)
Assuming you can't upgrade that slow line, how about running a
compressed ssh tunnel between the oracle server and the client,
port-forwarding 1521/tcp over it and changing your tnsnames to
point at the client machine?
Paul.
>>Both of these values are terrible.
>>#1 What kind of network connection do you have? 100BaseT?
>>#2 What kind of server are you writing to?
>>#3 How fast does this run if you write to your local machine?
>>Michael D. Black
>It DOES sound terrible since 90%? of the time is spend in retrieving
>data from a remote oracle server over a slow line. The problem is that
>the time spend saving to sqlite is still more than I used to spend on
>saving to my previous file system.
>Can you answer #3 though?
It's nearly 50% faster.
>Does anybody know how to make the journal file go to a different
location than the database? Apprarently it's not treated as a
"temporary" file. Perhaps it should be??
________________________________
From: sqlite-use...@sqlite.org on behalf of Werner Smit
Sent: Wed 7/14/2010 11:11 AM
To: General Discussion of SQLite Database
> It DOES sound terrible since 90%? of the time is spend in retrieving
> data from a remote oracle server over a slow line.
I think you're trying to optimise the wrong thing :)
Assuming you can't upgrade that slow line, how about running a
compressed ssh tunnel between the oracle server and the client,
port-forwarding 1521/tcp over it and changing your tnsnames to
point at the client machine?
Paul.
_______________________________________________
Wow! That sound fascinating!
But since my client is in the banking environment they have strict
procedures about network control.
I will research this option, but target date for implementation would
possible be middle 2020.
I'd however like to test this with a local server - is there a faq
somewhere about compressed ssh tunneling on oracle ports?
Is it open source?
Have you done it yourself?
What was speed improvement?
Werner
DISCLAIMER:
Everything in this email and its attachments relating to the official business of Face Technologies is proprietary to Face Technologies. It is confidential, legally privileged and protected by law. The person addressed in the email is the sole authorised recipient. Any unauthorized dissemination or copying of this e-mail (or any attachment to this e-mail) or the wrongful disclosure of the information herein contained is prohibited.
_______________________________________________
> Does anybody know how to make the journal file go to a different
> location than the database? Apprarently it's not treated as a "temporary"
> file. Perhaps it should be??
Seems like you'd have to communicate the journal location to other
processes, meaning you'd have to write the name of the journal file
into the main db, in the header or something. I think sqlite doesn't
do that at the moment, which means you'd have to change the file
format, which sqlite devs are loath to do.
--
Eric A. Smith
Worthless.
-- Sir George Bidell Airy, KCB, MA, LLD, DCL, FRS, FRAS
(Astronomer Royal of Great Britain), estimating for the
Chancellor of the Exchequer the potential value of the
"analytical engine" invented by Charles Babbage, September
15, 1842.
________________________________
From: sqlite-use...@sqlite.org on behalf of Werner Smit
Sent: Wed 7/14/2010 11:23 AM
To: General Discussion of SQLite Database
Subject: Re: [sqlite] EXTERNAL:Re: Sqlite Insert Speed Optimization
________________________________
From: sqlite-use...@sqlite.org on behalf of Eric Smith
Sent: Wed 7/14/2010 11:24 AM
To: General Discussion of SQLite Database
>>According to my math your final database size should be on the order
of 100Meg?
>>
>>That means at 200 minutes and 1,000,000 records:
>>83 inserts per second
>>8333 bytes per second
>>Both of these values are terrible.
>>#1 What kind of network connection do you have? 100BaseT?
>>#2 What kind of server are you writing to?
>>#3 How fast does this run if you write to your local machine?
>>Michael D. Black
>It DOES sound terrible since 90%? of the time is spend in retrieving
>data from a remote oracle server over a slow line. The problem is that
>the time spend saving to sqlite is still more than I used to spend on
>saving to my previous file system.
>Can you answer #3 though?
It's nearly 50% faster.
>Does anybody know how to make the journal file go to a different
location than the database? Apprarently it's not treated as a
"temporary" file. Perhaps it should be??
Mike said:
Now I'm confused...how can you be 50% faster if 90% of the time is in
retrieving from Oracle?
Werner with a list of processing speeds to clarify matters.
Date Strt End Minutes recs Version Filename
2010-05-03 15:42 16:49 66.7 F 421710 4.4.0.6 Due
2010-05-24 01:12 01:30 17.9 F 419604 CLAR Due
2010-05-27 11:07 13:01 113.5 F 419604 4.4.0.7 Due
2010-06-03 18:54 19:42 47.8 F 419604 4.4.0.9 Due
2010-06-04 17:24 18:09 45.5 B 419604 4.4.0.9 Due
2010-06-07 11:08 13:59 170.5 U 419604 4.4.0.8 Due
2010-06-17 09:13 09:29 16.2 B 419604 CLAR Due
2010-06-28 13:51 14:07 15.9 F 419604 CLAR Due
2010-06-28 15:12 15:37 25.4 B 419604 CLAR Due
2010-06-29 15:57 16:21 24.1 B 419604 CLAR Due
2010-06-30 11:39 12:17 37.8 F 418620 4.4.1.2 Due
2010-07-08 14:38 15:28 49.4 F 418620 4.4.1.2 Due
2010-07-13 13:09 13:52 42.2 U 418620 4.4.1.2 Due
2010-07-13 18:02 18:31 28.6 F 418620 4.4.1.2 Due (local drive)
2010-07-14 13:27 14:15 47.8 B 418620 4.4.1.2 Due
2010-07-14 15:28 16:03 34.6 F 418620 4.4.1.2 Due
I hope above lines stay in correct columns.
As you can see I started with building these records in 66 minutes.
Where previous version "CLAR" takes between 15 and 24 minutes
After taking out count(*) and adding a few pragma's and saving 6000
records rather than 500 at a time I've got it down to 34 minutes.
If I build in on local drive it takes 28 minutes.(with chunks of 500)
I went with a finecomb thru the code and could not find any other
showstoppers like "count"
Maybe I should dump to a txt file and see what the speed is like?
ps. The last entry (34 minutes) was done with blocks of 6000 rather than
blocks of 500.
Pinging allpay [196.11.183.3] with 32 bytes of data:
Reply from 196.11.183.3: bytes=32 time=115ms TTL=250
Reply from 196.11.183.3: bytes=32 time=175ms TTL=250
..
Reply from 196.11.183.3: bytes=32 time=228ms TTL=250
Reply from 196.11.183.3: bytes=32 time=264ms TTL=250
Ping statistics for 196.11.183.3:
Packets: Sent = 8, Received = 8, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 74ms, Maximum = 264ms, Average = 178ms
> After taking out count(*) and adding a few pragma's and saving 6000
> records rather than 500 at a time I've got it down to 34 minutes.
> If I build in on local drive it takes 28 minutes.(with chunks of 500)
Why not do an apples-to-apples test and commit the same number of
records per batch in each test? The idea was to vary only one thing
(the mount point) and keep all other variables constant.
250ms ping times, wow. SQLite write speeds will be better if the NFS
server is on the same planet as the client.
--
Eric A. Smith
Software is like entropy. It is difficult to grasp, weighs nothing,
and obeys the Second Law of Thermodynamics, i.e., it always increases.
________________________________
From: sqlite-use...@sqlite.org on behalf of Werner Smit
Sent: Wed 7/14/2010 11:43 AM
To: General Discussion of SQLite Database
Subject: EXTERNAL:Re: [sqlite] Sqlite Insert Speed Optimization
> If you could set the journcal location BEFORE you open the database that
> wouldn't be such a bad thing. Giving us the ability to do this would allow
> for the flexibility when needed with appropriate warnings about how to
> recover.
>
You can write your own VFS that places the journal file wherever you want.
--
---------------------
D. Richard Hipp
d...@sqlite.org
________________________________
From: sqlite-use...@sqlite.org on behalf of Richard Hipp
Sent: Wed 7/14/2010 12:12 PM
To: General Discussion of SQLite Database
Subject: EXTERNAL:Re: [sqlite] Sqlite Insert Speed Optimization
> Was that a facetious remark???
>
> Rather than "here's a function/pragma that allows you to put the journal
> file where you want -- but BE CAREFUL BECAUSE..."
>
> Writing you own VFS is not for the casual user...
>
> I was trying just to find where the journal filename was created but
> there's no "db-journal" string in either the .h or .c file.
>
> It would be trivial to add a function to set it via the C interface.
>
Putting the rollback journal in any directory other than the same directory
as the database file is an dangerous thing to do, because it risks being
unable to locate the rollback journal after a crash, resulting in database
corruption. Hence, we have no intention of supporting such a feature. If
you really need it badly enough, you can write your own VFS to make it
happen. Yes, writing your own VFS is hard to do. But this serves to
discourage people from doing it, which is what we want.
> Does anybody know how to make the journal file go to a different location than the database? Apprarently it's not treated as a "temporary" file. Perhaps it should be??
It's essential not to treat the journal file as a temporary file (e.g put it in /tmp for a Unix environment). Temporary directories are wiped out at boot time, whereas a SQLite journal file is used after a crash and reboot to recover your database to a sane and usable state.
And I agree with other people in this thread: it is pointless for you to mess about trying to optimize the operation of SQLite at this stage because your bottleneck to performance is the speed of your network link. You might somehow achieve a 5% improvement in speed by endless messing with SQLite whereas getting a faster path to your remote storage might get you a 50% improvement. Take a look at your network toplogy, the priority settings on your switches, etc..
Simon.
________________________________
From: sqlite-use...@sqlite.org on behalf of Richard Hipp
Sent: Wed 7/14/2010 12:51 PM
To: General Discussion of SQLite Database
Subject: EXTERNAL:Re: [sqlite] Sqlite Insert Speed Optimization
>
>
> On Wed, Jul 14, 2010 at 1:34 PM, Black, Michael (IS) <
> Michael...@ngc.com> wrote:
>
>> Was that a facetious remark???
>>
>> Rather than "here's a function/pragma that allows you to put the journal
>> file where you want -- but BE CAREFUL BECAUSE..."
>>
>> Writing you own VFS is not for the casual user...
>>
>> I was trying just to find where the journal filename was created but
>> there's no "db-journal" string in either the .h or .c file.
>>
>> It would be trivial to add a function to set it via the C interface.
>>
>
> Putting the rollback journal in any directory other than the same directory
> as the database file is an dangerous thing to do, because it risks being
> unable to locate the rollback journal after a crash, resulting in database
> corruption. Hence, we have no intention of supporting such a feature. If
> you really need it badly enough, you can write your own VFS to make it
> happen. Yes, writing your own VFS is hard to do. But this serves to
> discourage people from doing it, which is what we want.
>
>
Well, I'm wrong. Turns out we are going to help you hang yourself after
all: I forgot about these features:
PRAGMA journal_mode=MEMORY;
PRAGMA journal_mode=OFF;
Set one of those and you get no disk I/O from reading or writing the journal
file. And you will corrupt your database on a crash. On your own head be
it.
________________________________
From: sqlite-use...@sqlite.org on behalf of Richard Hipp
Sent: Wed 7/14/2010 1:12 PM
To: General Discussion of SQLite Database
Subject: EXTERNAL:Re: [sqlite] Sqlite Insert Speed Optimization
This is a polite way of saying, "This a *really* Bad Idea. There is
a big gun over there. Have fun shooting yourself in the foot. Can
we take pictures?"
If you know enough about the whole system to understand the full
implications of moving the journal file, chances are you know enough
about the whole system to make the VFS modifications without serious
thought. If you don't, you should likely think twice about hacking
up a fundamental transaction behavior and dismissing years and years
of practical knowledge and experience.
Adding a function would be trivial... which means there are really
good reasons why it isn't there.
There is a really big gun right here (3.6.23.1):
$ grep -n \"-journal\" sqlite3.c
35406: memcpy(&pPager->zJournal[nPathname], "-journal", 8);
You're not finding "db-journal" because the "db" comes from your
application.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson
> I'd however like to test this with a local server - is there a faq
> somewhere about compressed ssh tunneling on oracle ports? Is it open
> source? Have you done it yourself? What was speed improvement?
As mentioned by other posters, compressing on the LAN will probably not
help because the compression time is > network time.
SSH compression and tunnelling are going to be dependent on your
environment, but if you have a Windows client you could start by
looking at:
http://the.earth.li/~sgtatham/putty/0.60/htmldoc/Chapter3.html#using-por
t-forwarding
and also section 4.18.3 in that manual.
We do use ssh port-forwarding but not for Oracle. However as my
firewall says that Ora only needs 1521/tcp then I don't see why
it wouldn't work. But this is starting to go off-topic for
this list...
Paul.