[sqlite] Optimize Table Access Efficiency

8 views
Skip to first unread message

Tim Morton

unread,
Mar 20, 2012, 10:21:23 AM3/20/12
to sqlite...@sqlite.org
Greetings,

My app reads dozens of SQLite databases ranging in size from 1MB to
100MB with a simple table structure like,

"create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"

On startup the app reads all the databases to extract the "topics"
column data. With the DB configured as above extracting the topic data
is a lengthy process. It seems the whole DB file is being read to just
access the relatively small "topics" column.

If I add another table with just the topics data,

"create table dictionary(id INTEGER PRIMARY KEY, topics)"

access is quicker. However, if I make another DB file with just the
topics column, access is much quicker. I don't really want to use two
files to access a single DB.

Also, the DB is only read from; not written to by the app.

My questions,

Does table creation order affect the speed of access?

Is there a way to read only the part of the file that has the queried
table/column thus saving time and hard dive grinding?

Is it possible to read a small table/column from a huge file nearly as
quick as the small table alone from a small file? And if so, how?

Thanks,

_______________________________________________
sqlite-users mailing list
sqlite...@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Black, Michael (IS)

unread,
Mar 20, 2012, 10:26:12 AM3/20/12
to General Discussion of SQLite Database
Try creating 2 tables, one for topics, one for definitions.

Then insert all the topics at once followed by all the definitions.

That should give you the same disk layout as two databases.

And you don't say what "lengthy" means.

Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: sqlite-use...@sqlite.org [sqlite-use...@sqlite.org] on behalf of Tim Morton [t...@Preservedwords.com]
Sent: Tuesday, March 20, 2012 9:21 AM
To: sqlite...@sqlite.org
Subject: EXT :[sqlite] Optimize Table Access Efficiency

Tim Morton

unread,
Mar 20, 2012, 10:44:17 AM3/20/12
to General Discussion of SQLite Database
Thanks for the reply,

"Lengthy" varies according to the system, but from a fresh, initial
start on this one it can be 15 to 20 seconds. The system cache speeds up
subsequent starts to around 3-4 seconds.

As mentioned in the post, I have two tables, one for topics and one for
definitions. I read the topics table and get the index and then use it
to get the definition. The one thing I didn't do is populate the two
tables on creation separately. They both are populated in the same "for
loop." I'll try that, but with my previous texts I could always get
better speed with the topics in a separate DB on a fresh start.

Tim

Black, Michael (IS)

unread,
Mar 20, 2012, 10:53:50 AM3/20/12
to General Discussion of SQLite Database
You may also want to try pragma cache_size and bump it up a LOT just to see what happens to your timings.

Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: sqlite-use...@sqlite.org [sqlite-use...@sqlite.org] on behalf of Tim Morton [t...@Preservedwords.com]

Sent: Tuesday, March 20, 2012 9:44 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] Optimize Table Access Efficiency

Simon Slavin

unread,
Mar 20, 2012, 11:13:45 AM3/20/12
to General Discussion of SQLite Database

On 20 Mar 2012, at 2:21pm, Tim Morton <t...@Preservedwords.com> wrote:

> My app reads dozens of SQLite databases ranging in size from 1MB to 100MB with a simple table structure like,
>
> "create table dictionary(id INTEGER PRIMARY KEY, topics, definition)"
>
> On startup the app reads all the databases to extract the "topics" column data. With the DB configured as above extracting the topic data is a lengthy process. It seems the whole DB file is being read to just access the relatively small "topics" column.
>
> If I add another table with just the topics data,
>
> "create table dictionary(id INTEGER PRIMARY KEY, topics)"
>
> access is quicker. However, if I make another DB file with just the topics column, access is much quicker. I don't really want to use two files to access a single DB.
>
> Also, the DB is only read from; not written to by the app.

Thank you for your detailed description which helps a lot in considering your situation.

> My questions,
>
> Does table creation order affect the speed of access?

These things depend mostly on your hardware. Windows deals much better with defragmented files than fragmented files. A lot of its speed comes from assumptions like if you're looking at sector s, you're shortly going to want sector s+1. To speed up your situation, after your database file has been written, execute a VACUUM command in SQL then (if you're running Windows) defragment your hard disk.

If the above does not give you fast enough access, then it may be possible to speed things up still further but writing one TABLE then the other. You can do this when you originally make the file, or you can use the sqlite shell tool to .dump the file to SQL commands and .read the SQL commands back in to form another database file. After doing this, once again, if you're running Windows do another defragment. I don't expect that to make much difference but it might be worth trying. You should also look at the PRAGMA command Michael recommended.

> Is there a way to read only the part of the file that has the queried table/column thus saving time and hard dive grinding?
>
> Is it possible to read a small table/column from a huge file nearly as quick as the small table alone from a small file? And if so, how?

Your TABLE definition is as efficient as it can be. Your most efficient reading code in each situation is

SELECT id,topics FROM dictionary

Any other optimization can't be done inside the SQL command.

Simon.

Tim Morton

unread,
Mar 20, 2012, 12:16:28 PM3/20/12
to General Discussion of SQLite Database
Thanks, Simon, for your detailed answer.

I will try the suggestions you and Michael supplied and see if there is
any significant inprovement.

Tim

Max Vlasov

unread,
Mar 20, 2012, 1:52:00 PM3/20/12
to General Discussion of SQLite Database
Hi, Tim

On Tue, Mar 20, 2012 at 6:21 PM, Tim Morton <t...@preservedwords.com> wrote:

>
> Is there a way to read only the part of the file that has the queried
> table/column thus saving time and hard dive grinding?
>
>

There is also a trick. You can create an index on 'topics' column and
perform your first reading forcing this column order and not touching
definition at the same time. Since any index is just another b-tree but
most of the time smaller you will end up reading a smaller part of file.
Although there's a price to pay, your db will become larger.

Max

Tim Morton

unread,
Mar 20, 2012, 2:11:46 PM3/20/12
to General Discussion of SQLite Database
Thanks,

I may try this, but the topics are usually just one word each so I doubt
an index could reduce it much.

Tim

Simon Slavin

unread,
Mar 20, 2012, 2:36:01 PM3/20/12
to General Discussion of SQLite Database

On 20 Mar 2012, at 6:11pm, Tim Morton <t...@Preservedwords.com> wrote:

> I may try this, but the topics are usually just one word each so I doubt an index could reduce it much.

Max's trick is something that takes advantage of how SQLite works entirely. If you do a SELECT like

SELECT id,topics FROM dictionary ORDER BY topics

and there's an index like

CREATE INDEX dti ON dictionary (topics,id)

then SQLite will use that index to find the right order to retrieve the records in. But having figured out what rows it needs it realises that it has all the information -- both the id and the topic -- right there in the index. So it doesn't bother then looking up the TABLE data to fetch the other columns, it just returns all the information you asked for right from the index it's already looking at. It's fast and, because the index contains only the data you want, it's equivalent to making up a separate table which has just the id and topics columns.

I forgot about that trick when I replied but Max didn't.

Simon.

Tim Morton

unread,
Mar 20, 2012, 3:02:09 PM3/20/12
to General Discussion of SQLite Database
Ah, very good. Thanks to you both.

I will definitly try this. Sounds like it may help.

Tim

Tim Morton

unread,
Mar 20, 2012, 5:33:23 PM3/20/12
to General Discussion of SQLite Database
I did some quick tests:

I took a 100,000 entry dictionary and made a single table

"create table dict (id INTEGER PRIMARY KEY, topics, def)"

I ran a "for loop" on the cursor object to extract all the topic column
data and put in in a Python list .

cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

The time was around .035 sec.

With an index created,

'CREATE INDEX dctidx ON dict (id, topics)'

There was no difference in time. The file was around 2MB larger.

I made a second table with just the topic data,

"create table topics (topic)"

with this select,

'SELECT topic FROM topics'

The time was roughly half at around .017 sec.

I made a separate file with just the topic data table as above and the
access time was only slightly less .014-.015 sec.

Also, I populated the topics table completely and then populated the def
table, with no difference in speed if they were populated in the same loop.

So it seems the index is no help; a second topics table is a significant
help; and a separate file with topics table a negligible help; and
creation order is no help.

They were all good ideas to try, though.

Tim


On 3/20/2012 2:36 PM, Simon Slavin wrote:

Simon Slavin

unread,
Mar 20, 2012, 6:44:27 PM3/20/12
to General Discussion of SQLite Database

On 20 Mar 2012, at 9:33pm, Tim Morton <t...@Preservedwords.com> wrote:

> So it seems the index is no help; a second topics table is a significant help; and a separate file with topics table a negligible help; and creation order is no help.

Interesting and a little unexpected. Good to see a user doing experimentation.

Max Vlasov

unread,
Mar 21, 2012, 2:53:56 AM3/21/12
to General Discussion of SQLite Database
On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton <t...@preservedwords.com> wrote:

> So it seems the index is no help;
>


Are you sure you did as Simon explained?
Becausem my tests shows the trick works... (sqlite 3.7.10)

Create table [TestTable] ([Id] integer primary key, [topics] Text,
[Definition] Text);
CREATE TABLE [SourceTable] ([Id] integer primary key, [Title] Text);

Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('bar')
Insert into SourceTable (Title) VALUES ('foo bar')
Insert into SourceTable (Title) VALUES ('bar foo')

insert into TestTable (topics, definition) values ((Select Title from
SourceTable order by random() limit 1), 'abcdefghij ...') (about 500
symbols)
100,000 times so we have about 100,000 records

select topics, id from TestTable
:Sqlite read 103 Mb (I have a special "hack" into vfs to monitor how much
sqlite read during the query)
:Time: 4,9 sec

...After adding the index

CREATE INDEX [idx_Trick] ON [TestTable] ([Topics], [Id])

select topics, id from TestTable order by topics,id
:Sqlite read 2 Mb
:Time: 1,3 sec

The timing was unaffected by sqlite and system cache since the db was
reopened with a special trick on Windows forcing it to clear cache for this
file

Max

Eduardo Morras

unread,
Mar 21, 2012, 6:38:13 AM3/21/12
to General Discussion of SQLite Database
At 07:53 21/03/2012, Max Vlasov wrote:
>On Wed, Mar 21, 2012 at 1:33 AM, Tim Morton <t...@preservedwords.com> wrote:
>
> > So it seems the index is no help;
> >
>
>
>Are you sure you did as Simon explained?
>Becausem my tests shows the trick works... (sqlite 3.7.10)

The trick works, but i think he didn't timed what he should timed.

He did:

cur.execute('SELECT topics FROM dict')
start = time.time()
for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

So he timed getting the data from the python cursor and moving to a
python array. I think he should time:

start = time.time()


cur.execute('SELECT topics FROM dict')

for item in cur:
tmp.append(item[0])
print time.time() - start, 'get WL'

Including the SELECT because that's where the trick work.

Tim Morton

unread,
Mar 21, 2012, 6:39:47 AM3/21/12
to General Discussion of SQLite Database
When I did the index test I did not have the second table in the DB.
Just the complete large table (id, topics, definition). I also didn't have

order by random() limit 1

In the select statment. The data is alread in alphabetical order, but I'll try that anyway.

I am going to be away from the office over the next week or so but I will look into this some more as I get time.

Thank you both again for your time and patience.

Tim

Reply all
Reply to author
Forward
0 new messages