I am using transactions to write data in a sqlite3 database. As noted
in the sqlite3 FAQ (http://www.sqlite.org/faq.html#q19), disk access
limitations do not enable to perform more than a few dozen
transactions per second.
To increase the performance of my writing operations, I started
grouping several put() statements inside a single transaction. As I
thought that put() was merely some INSERT statements, which are very
quick because no disk access, I expected the following code to write
my two storeable objects in one single transaction about as long as if
I had put only one object.
dba::Transaction* transaction = new dba::Transaction(_dbaArchive-
The above code takes about 100 ms to run. If I remove one of the
put(), only 50 ms. I have therefore no performance gain from grouping
the put() in a single transaction.
Did I misunderstand something about transactions? or does put() write
on the disk before committing?
> I am using transactions to write data in a sqlite3 database. As noted > in the sqlite3 FAQ (http://www.sqlite.org/faq.html#q19), disk access > limitations do not enable to perform more than a few dozen > transactions per second. > To increase the performance of my writing operations, I started > grouping several put() statements inside a single transaction. As I > thought that put() was merely some INSERT statements, which are very > quick because no disk access, I expected the following code to write > my two storeable objects in one single transaction about as long as if > I had put only one object.
> The above code takes about 100 ms to run. If I remove one of the > put(), only 50 ms. I have therefore no performance gain from grouping > the put() in a single transaction. > Did I misunderstand something about transactions? or does put() write > on the disk before committing?
It sends couple of queries to get next id for object and then sends inserts. All of this is enclosed in BEGIN...COMMIT statements. I cannot say for sure if or when sqlite library performs disk operations.
You can recompile debea with DBA_QUERY_DEBUG=1 - it will log all queries to stderr.
I think that you can archive big performance gain by replacing IdFetcher with custom one. For example you can assign id from memory and store last used id in database when application exits. This will remove 1 SELECT and 1 UPDATE per put().
Thanks for your answer Łukasz. I used your compilation flag to list the queries performed for one put() call if I use a custom id fetcher :
0xb53dbd0:BEGIN 0xb53dbd0:INSERT INTO level0_table (id) VALUES (275893) 0xb53dbd0:SELECT id FROM string_table WHERE strings_in_level0=275893 0xb53dbd0:INSERT INTO string_table (id,string_key,string_value,strings_in_level0) VALUES (275894,'Event','foo',275893)
Then I ran these queries one by one and found out that the most time-consuming one was by far the SELECT. In my application each "level0" object has a list of strings that are stored using a BIND_COL macro. I don't really understand why the SELECT is needed? We just added the new id in the level0_table and we still have to look for this id in the string_table, why?
Regards, Yves-Olivier
On 27 January 2011 13:25, Lukasz Michalski <l...@zork.pl> wrote:
> On 01/27/11 12:26, Yves-Olivier Renault wrote: > > Hi,
> > I am using transactions to write data in a sqlite3 database. As noted > > in the sqlite3 FAQ (http://www.sqlite.org/faq.html#q19), disk access > > limitations do not enable to perform more than a few dozen > > transactions per second. > > To increase the performance of my writing operations, I started > > grouping several put() statements inside a single transaction. As I > > thought that put() was merely some INSERT statements, which are very > > quick because no disk access, I expected the following code to write > > my two storeable objects in one single transaction about as long as if > > I had put only one object.
> > The above code takes about 100 ms to run. If I remove one of the > > put(), only 50 ms. I have therefore no performance gain from grouping > > the put() in a single transaction. > > Did I misunderstand something about transactions? or does put() write > > on the disk before committing?
> It sends couple of queries to get next id for object and then sends > inserts. All of this is enclosed in BEGIN...COMMIT statements. I cannot > say for sure if or when sqlite library performs disk operations.
> You can recompile debea with DBA_QUERY_DEBUG=1 - it will log all queries > to stderr.
> I think that you can archive big performance gain by replacing IdFetcher > with custom one. For example you can assign id from memory and store > last used id in database when application exits. This will remove 1 > SELECT and 1 UPDATE per put().
> Thanks for your answer Łukasz. I used your compilation flag to list the > queries performed for one put() call if I use a custom id fetcher :
> 0xb53dbd0:BEGIN > 0xb53dbd0:INSERT INTO level0_table (id) VALUES (275893) > 0xb53dbd0:SELECT id FROM string_table WHERE strings_in_level0=275893 > 0xb53dbd0:INSERT INTO string_table > (id,string_key,string_value,strings_in_level0) VALUES > (275894,'Event','foo',275893)
> Then I ran these queries one by one and found out that the most > time-consuming one was by far the SELECT. In my application each > "level0" object has a list of strings that are stored using a BIND_COL > macro. > I don't really understand why the SELECT is needed? We just added the > new id in the level0_table and we still have to look for this id in the > string_table, why?
You are about to store object with std::list mapped using BIND_COL. Lets say that this is new_list.
Debea need to know which objects are deleted from new_list since last put().
It is done by reading ids of all objects from database that were in collection last time using SELECT to old_list.
All objects that exists in new_list are removed from old_list.
If there is anything left on old_list, then debea issues 'DELETE from table WHERE id in ( old_list[0],...,old_list[n] )' to delete objects that you removed from collection since last put().
This SELECT is indeed not needed if object that is parent to list is about to be inserted.
Please add a ticket for this, I will try to look at serialization code and see if I can optimize out this list comparison for new objects.
> On 01/27/11 17:00, Yves-Olivier Renault wrote: > > Thanks for your answer Łukasz. I used your compilation flag to list the > > queries performed for one put() call if I use a custom id fetcher :
> > 0xb53dbd0:BEGIN > > 0xb53dbd0:INSERT INTO level0_table (id) VALUES (275893) > > 0xb53dbd0:SELECT id FROM string_table WHERE strings_in_level0=275893 > > 0xb53dbd0:INSERT INTO string_table > > (id,string_key,string_value,strings_in_level0) VALUES > > (275894,'Event','foo',275893)
> > Then I ran these queries one by one and found out that the most > > time-consuming one was by far the SELECT. In my application each > > "level0" object has a list of strings that are stored using a BIND_COL > > macro. > > I don't really understand why the SELECT is needed? We just added the > > new id in the level0_table and we still have to look for this id in the > > string_table, why?
> You are about to store object with std::list mapped using BIND_COL. Lets > say that this is new_list.
> Debea need to know which objects are deleted from new_list since last > put().
> It is done by reading ids of all objects from database that were in > collection last time using SELECT to old_list.
> All objects that exists in new_list are removed from old_list.
> If there is anything left on old_list, then debea issues 'DELETE from > table WHERE id in ( old_list[0],...,old_list[n] )' to delete objects > that you removed from collection since last put().
> This SELECT is indeed not needed if object that is parent to list is > about to be inserted.
> Please add a ticket for this, I will try to look at serialization code > and see if I can optimize out this list comparison for new objects.