Speeding up SELECT queries?

22 views
Skip to first unread message

Durand

unread,
Jun 1, 2010, 3:59:33 PM6/1/10
to python-sqlite
Hi,

I recently converted an app to use an sqlite3 db as opposed to an xml
file.
It seems that executing a select query is 100 times slower in sqlite
than using something like etree to read an xml file. Am I doing
something wrong here?

XML code:

t1 = time.time()


for reading in readings:
if reading.tag == "reading":
if reading.get("r_type") == "ja_on":
on_readings.append(reading.text)
elif reading.get("r_type") == "ja_kun":
kun_readings.append(reading.text)
elif reading.tag == "meaning" and reading.get("m_lang") ==
None:
meanings.append(reading.text)
t2 = time.time()
print '%s took %0.3f ms' % ("exec", (t2-t1)*1000.0)

sqlite code:

t1 = time.time()


on_readings = [ item[0] for item in CURSOR.execute("SELECT
reading from readings where literal = ? and r_type = 'ja_on'",
kanji).fetchall() ]
kun_readings = [ item[0] for item in CURSOR.execute("SELECT
reading from readings where literal = ? and r_type = 'ja_kun'",
kanji).fetchall() ]
meanings = [ item[0] for item in CURSOR.execute("SELECT
meaning from meanings where literal = ? and m_lang = 'en'",
kanji).fetchall() ]
t2 = time.time()
print '%s took %0.3f ms' % ("exec", (t2-t1)*1000.0)

Each execute statement takes about 20ms whereas getting the same data
from the xml file takes about 0.3ms. I thought sqlite was much faster
than this? Or maybe it's the structure of the database. I wouldn't
mind changing it if it makes things faster.

Thanks.

Roger Binns

unread,
Jun 1, 2010, 10:05:02 PM6/1/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/01/2010 12:59 PM, Durand wrote:
> Each execute statement takes about 20ms whereas getting the same data
> from the xml file takes about 0.3ms. I thought sqlite was much faster
> than this?

I don't see you measuring the same thing from the code you supplied. The
XML is doing a single iteration over every 'reading'. The SQLite is doing
unindexed searches over the whole dataset three times.

In the SQLite case you are looking for a particular value (literal = ?)
which means SQLite has to check every row for a match. If you create an
index then it can merely consult the index.

To see what SQLite is doing, put "EXPLAIN QUERY PLAN" in front of your query
- - for example "EXPLAIN QUERY PLAN SELECT reading FROM readings where ..."
and you'll see what indices will be used (if any).

That will give you the biggest performance improvement. Following on from
that SQLite defaults to a maximum memory cache of 2MB. Your XML code
doesn't show whether you are using the incremental parser or if the XML was
already all parsed and present in memory.

Roger

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwFvEoACgkQmOOfHg372QTr0QCg5DC1S0ZyAHdGWGs9VqMaF6fZ
Xc8AoJipxGjMdCZqWZTKPE8LSQKo4Cnr
=JyG5
-----END PGP SIGNATURE-----

Durand

unread,
Jun 3, 2010, 3:29:19 PM6/3/10
to python...@googlegroups.com
Hi,

Thanks for the reply. I see what you mean about the comparison. I just
realised that I had actually improved speed on the xml one by using an
index. I didn't realise that the same thing was possible with sql. I'm
reading up on indexes now. Thanks again!

Durand

On Wed, 02 Jun 2010 03:05:02 +0100, Roger Binns <rog...@rogerbinns.com>
wrote:

Durand

unread,
Jun 3, 2010, 5:11:48 PM6/3/10
to python...@googlegroups.com
Well, I added indexes for the columns and tables I'm searching in and
whatdyaknow, it's about 9 times faster! It's now actually much quicker
than the xml version, which is brilliant! Thank you! I do have another
question though. Is there a performance loss in having a lot of indexes? I
currently have four indexes, each with 3 columns indexed on average. Is it
better to have one column per index or one index per table or what?

Thanks,
Durand


On Wed, 02 Jun 2010 03:05:02 +0100, Roger Binns <rog...@rogerbinns.com>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

sk...@pobox.com

unread,
Jun 3, 2010, 9:06:38 PM6/3/10
to python...@googlegroups.com

Durand> I do have another question though. Is there a performance loss
Durand> in having a lot of indexes?

I don't know about Sqlite in particular, but in other SQL implementations
I'm familiar with indexes are a classic time/space tradeoff. Disk space
being cheap and time not, it's generally better to err on the side of too
many indexes than not enough. If you do a lot of inserts though, time to
maintain your indexes might be significant, especially for large tables.

Durand> I currently have four indexes, each with 3 columns indexed on
Durand> average. Is it better to have one column per index or one index
Durand> per table or what?

It depends entirely on your query patterns.

--
Skip Montanaro - sk...@pobox.com - http://www.smontanaro.net/

Roger Binns

unread,
Jun 3, 2010, 9:19:02 PM6/3/10
to python...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/03/2010 02:11 PM, Durand wrote:
> Is there a performance loss in having a lot of indexes?

There is no loss when doing queries since indices not relevant the query are
ignored.

They do affect performance when doing insert, update or delete since
affected indices have to be altered for those rows, in addition to consuming
extra disk space as Skip mentioned. Note that if you are loading lots of
data then it is better to create the indices afterwards rather than before.

> I currently have four indexes, each with 3 columns indexed on average.
> Is it better to have one column per index or one index per table or what?

SQLite will only use one index per query. Consequently your indices need to
match your queries. For example if you create an index on column A and do a
query "select B from table where A=?" then the index will be used to find
rows with matching A, but the database will still have to be consulted to
get the value of B. If the index was on (A,B) then the query could be
satisfied entirely out of the index, without having additional seeks into
the database.

If you are looking for general tips on improving SQLite performance then
look at the SQLite site. I strongly recommend you create a benchmark that
is representative of your usage patterns and then use that to see what
effect various things have. Examples of things you should find improving
performance are increasing the page size and the cache size.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkwIVIEACgkQmOOfHg372QS2LQCdGbjFs7SzUosvrYxL6NFKnPe3
K/YAniDA60pV7IbwQr1bRoKxNdrCUhbP
=IRdd
-----END PGP SIGNATURE-----

Durand

unread,
Jun 5, 2010, 9:42:06 PM6/5/10
to python...@googlegroups.com
sk...@pobox.com: Thanks, that was pretty useful. I guess it doesn't seem to
matter how many indexes I have, which is good.

Roger Binns:


> There is no loss when doing queries since indices not relevant the query
> are ignored.

> Note that if you are loading lots of data then it is better to create
> the indices afterwards rather than before.

That's perfect. Once the db has been created, there wouldn't be much
writing being done so that's a non issue. Thanks a lot, you've been a
*huge* help! I'll definitely look at the optimisation stuff.

Reply all
Reply to author
Forward
0 new messages