Why are SQLITE queries so slow?

548 views
Skip to first unread message

Renaud

unread,
Mar 25, 2015, 10:03:31 AM3/25/15
to racket...@googlegroups.com
Hi,

I'm new to Racket, and i would like to know why sqlite queries are so slow in my test program.

This program imports some data from a text file into a simple sqlite DB. It takes 35s with the INSERT queries and 5-6s without them.

I've done the same thing with other languages, all do far better. For example with perl the same code runs in 8s with the queries, and 2-3s with them.

I know that startup and regex are (reasonably ?) slower in racket, but queries seems to be very slow.

Am i doing somethin wrong?

Thanks for your help.


test done by swapping the commented line with the next. Code :
-----------------------------------------------------
#!/usr/bin/env racket

#lang racket/base
(require racket/list racket/string db/base db/sqlite3)

(define DBFILE "database.rkt.sql")
(define rx #rx"[~^]+")
(define sdb #f)

(define (openDatas)
(unless sdb (set! sdb (sqlite3-connect #:database DBFILE))))

(define (importDatas)
(when (file-exists? DBFILE) (delete-file DBFILE))
(set! sdb (sqlite3-connect #:database DBFILE #:mode 'create))
(start-transaction sdb)
(query-exec sdb "CREATE TABLE groups (fam INTEGER, name STRING , PRIMARY KEY (fam))")
(query-exec sdb "CREATE TABLE foods (alim INTEGER, fam INTEGER, name STRING , PRIMARY KEY (alim))")
(query-exec sdb "CREATE TABLE nutrients (nutr INTEGER, unit STRING, short STRING, name STRING , PRIMARY KEY (nutr))")
(query-exec sdb "CREATE TABLE contents (alim INTEGER, nutr INTEGER, val FLOAT , PRIMARY KEY (alim, nutr))")
(process-file "FD_GROUP.txt" "groups" 2)
(process-file "FOOD_DES.txt" "foods" 3)
(process-file "NUTR_DEF.txt" "nutrients" 4)
(process-file "NUT_DATA.txt" "contents" 3)
(commit-transaction sdb))

(define (process-file fname table nb)
(define Q (prepare sdb (string-append "INSERT INTO " table " VALUES (" (string-join (for/list ([x nb]) "?") ",") ")")))
(call-with-input-file fname (lambda (file)
(for ([line (in-port read-line file)])
;(bind-prepared-statement Q (take (regexp-split rx line 1) nb))))))
(query-exec sdb (bind-prepared-statement Q (take (regexp-split rx line 1) nb)))))))


(importDatas)
-----------------------------------------------------

George Neuner

unread,
Mar 26, 2015, 9:51:15 AM3/26/15
to Renaud, users
On 3/25/2015 10:03 AM, Renaud wrote:
> I'm new to Racket, and i would like to know why sqlite queries are so slow in my test program.
>
> This program imports some data from a text file into a simple sqlite DB. It takes 35s with the INSERT queries and 5-6s without them.
>
> I've done the same thing with other languages, all do far better. For example with perl the same code runs in 8s with the queries, and 2-3s with them.
>
> I know that startup and regex are (reasonably ?) slower in racket, but queries seems to be very slow.
>
> Am i doing somethin wrong?
>
> Thanks for your help.

Hi Renaud,

I see that you have wrapped the inserts in an outer transaction - that
already addresses a well known performance issue with batch
inserts/updates. If your version of sqlite has the write ahead log, you
might try enabling that (if it isn't already).


Much of the problem is that sqlite is a "foreign" C library. Racket's
internal data formats [mostly] are not C compatible and so at every
function call, arguments must be translated to C and return data
translated to Racket. This is the "Foreign Function Interface" (FFI).
Perl's internal data formats are C already, and the DBD::SQLite lib has
the sqlite engine embedded into it. Because it is C top to bottom, Perl
has a significant speed advantage when calling sqlite functions.

File access in Racket is slightly slower than in Perl - you saw that
already - but that again is due to FFI (the C system calls). In my
experience Racket's regex is as fast as Perl's.

I think FFI overhead will explain 95+% of the time difference - and
there's not much you can do about it. You could try #lang typed/racket,
but I don't know how easily the db module can be used with it - I have
never tried and some things require a lot of effort to make work in the
typed language ... it's an ongoing project to make everything work smoothly.


You can slightly reduce startup time by compiling the Racket to a
bytecode *.zo file (using raco make). *.zo files will live in a
subdirectory called "compiled/" under the source file directory. Then
start your program with "racket -r <source_file>". Note that you can't
try to run the *.zo file directly - Racket checks the timestamps on the
source and bytecode file and recompiles (in memory, it won't save the
result) if necessary. You can reduce more by linking the program's *.zo
file(s) into an executable (using raco exe) that can be run directly.
This can help significantly with a large program, but it probably won't
make any difference with your tiny one.

George

Renaud

unread,
Mar 26, 2015, 10:16:55 AM3/26/15
to racket...@googlegroups.com
Thanks a lot George, for this insightful reply.

Your idea about FFI explaining 95% of the overhead looks good... but i wonder: in this case, shouldn't we also see a high cost in bind-prepared-statement alone?

(sorry for my sketchy english)

Thanks again,

Renaud

George Neuner

unread,
Mar 26, 2015, 2:05:46 PM3/26/15
to Renaud, users
On 3/26/2015 10:16 AM, Renaud wrote:
Thanks a lot George, for this insightful reply.

Your idea about FFI explaining 95% of the overhead looks good... but i wonder: in this case, shouldn't we also see a high cost in bind-prepared-statement alone?

Hi Renaud,

I'm not sure exactly what you are asking, so forgive me if I'm telling you what you already know. 

There is some overhead for every function call, and for foreign functions there is some additional FFI translation cost which is relative to the amount and types of data involved.  In Racket [and generally in Lisp family languages] most data has embedded type tags that must be removed to pass arguments to a foreign call, and conversely tags must be added to data returned from the foreign call.

SQL normally is compiled to bytecode for execution.  Sqlite simply interprets the bytecode, but many larger DBMS will further JIT compile to native code when a statement is prepared.

If you only use a prepared statement only once, you save nothing and, in fact, there will be a slight loss vs immediate execution due to the addition of the separate argument binding step.   It also costs memory to cache the compiled code in the DBMS (which for sqlite is in your own process space).  However, the more times you reuse a prepared statement, the less will be the amortized cost of compiling it, and because you send the SQL statement only once, you also progressively move less data.

So if, e.g., you convert 10 immediate statements into 1 prepared statement with 10 executions:
  • you avoid 9 sends and compiles of the SQL statement
  • you add 10 bind function calls, but the net amount of argument and response data remains the same.
The number of bind calls that balances the send and compile overhead is statement dependent.  However, every send/compile you avoid buys you that many more bind calls, so once you pass that initial balancing point, you have a net performance gain from using the prepared statement.


I know this is little help in analyzing your situation, but I hope it at least answers your question :-)
George

Jens Axel Søgaard

unread,
Mar 26, 2015, 2:21:11 PM3/26/15
to Renaud, racket...@googlegroups.com
How did you time the queries?

/Jens Axel
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
--
Jens Axel Søgaard

Renaud

unread,
Mar 26, 2015, 2:22:22 PM3/26/15
to racket...@googlegroups.com, rgom...@essentiel.net, us...@racket-lang.org
You're all forgiven since, 1/ there's nothing to forgive, and 2/ you're of great help with such precise and thorough answers.

I still don't know how to speed up my code, but i've learnt a lot.

Big thanks to you George, again.

Renaud

unread,
Mar 26, 2015, 2:29:58 PM3/26/15
to racket...@googlegroups.com, rgom...@essentiel.net, jens...@soegaard.net
> Jens Axel Søgaard

Hi Jens Axel,

I just used "time ./my-script.rkt" with the given code, then after swapping the comment near the bottom :

;(bind-prepared-statement Q (take (regexp-split rx line 1) nb))))))
(query-exec sdb (bind-prepared-statement Q (take (regexp-split rx line 1) nb)))))))

vs.

(bind-prepared-statement Q (take (regexp-split rx line 1) nb))))))
;(query-exec sdb (bind-prepared-statement Q (take (regexp-split rx line 1) nb)))))))

Crude... i know...

Jens Axel Søgaard

unread,
Mar 26, 2015, 2:37:19 PM3/26/15
to Renaud, racket...@googlegroups.com, Renaud
Did you subtract the startup cost of Racket to get the time of the queries?

/Jens Axel

Renaud

unread,
Mar 26, 2015, 2:45:24 PM3/26/15
to racket...@googlegroups.com, renaud....@gmail.com, rgom...@essentiel.net, jens...@soegaard.net
Le jeudi 26 mars 2015 19:37:19 UTC+1, Jens Axel Søgaard a écrit :
> Did you subtract the startup cost of Racket to get the time of the queries?
>
> /Jens Axel


I subtracted the two timings, which should get rid of everything but the query time, and gives about 30s for raket vs. about 5s for perl.
Reply all
Reply to author
Forward
0 new messages