[sqlite-dev] sqlite3_prepare_v3()

497 views
Skip to first unread message

Richard Hipp

unread,
May 31, 2017, 9:31:04 PM5/31/17
to sqlite-dev
Most prepared statements are created, used one or two times, then
quickly destroyed. But sometimes a prepared statement is cached and
kept around for a long time. For example, the RTREE and FTS virtual
tables create multiple prepared statements used for managing their
shadow tables and cache those prepared statements for the duration of
the database connection.

It turns out that the code generator really needs to know in advance
if a prepared statement will be persistent or transient. I'll explain
why below. But right now, there is no way for it to know this. And
so I'm proposing an enhancement to the sqlite3_prepare_v2() interface,
the _v3() version of the interface, that adds one additional parameter
which is a bit-vector of flags. Currently only a single bit is
defined: SQLITE_PREPARE_PERSISTENT. If that bit is set, then the code
generator assumes that the prepared statement will be around for a
long time. In the absence of that bit, the code generator assumes
that the prepared statement will be passed to sqlite3_finalize()
relatively quickly.

I'm not real happy with the "sqlite3_prepare_v3()" name, but I don't
know of a better approach. Suggestions are welcomed. Prototype code
is on a branch.

The reason the code generator wants to know if a prepared statement
will be persistent or transient has to do with memory allocation.
SQLite has a "lookaside" memory allocation pool that it uses for fast,
transient memory needs. For long-term needs or when lookaside is
unavailable, SQLite goes to system malloc for the space it needs.
Lookaside is much faster, but it is a finite resource. Lookaside is
intended to be used quickly and then freed. Applications should not
hang on to lookaside memory long-term as that ties up the limited
lookaside memory space and forces SQLite to go to the slower system
malloc for various transient memory needs.

Short-lived prepared statements should use lookaside memory. They
prepare about 25% faster when they do. But for long-lived prepared
statements, we do not want to tie up the limited number of lookaside
memory slots. It is better to be a little slower when preparing those
long-term prepared statements in order to avoid monopolizing the
limited amount of lookaside memory.

Furthermore, lookaside memory has only a single allocation size
(defaulting to 1200 bytes). And since most allocations are much
smaller than this, there is an excessive amount of internal
fragmentation with lookaside memory. That isn't a problem for
short-lived uses, but for long-term storage it is suboptimal to store
a 12-character identifier or label in a 1200 byte buffer.

Hence, for long-term prepared statements, the
SQLITE_PREPARE_PERSISTENT flag causes the code generator to bypass
lookaside memory and go directly to system malloc.

The new interface is designed to support additional flags in the
future. For example, we might add flags that cause the query planner
to spend extra time trying to optimize the code for a query that the
application knows will run for a long time - kind of like the -O3 flag
on your C compiler. Stuff like that.

The proposed change is on a branch
(https://www.sqlite.org/src/timeline?r=prepare_v3). I'm not overly
happy with it, but I have not yet come up with anything better. Your
suggestions and comments are welcomed.

--
D. Richard Hipp
d...@sqlite.org
_______________________________________________
sqlite-dev mailing list
sqlit...@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-dev

Darko Volaric

unread,
Jun 1, 2017, 1:52:02 AM6/1/17
to sqlit...@mailinglists.sqlite.org
As an alternative to this new call, would it be possible to have a "migrate" call that copies the prepared statement from lookaside to malloc'd memory? This would have a couple of advantages: it would preserve the 25% performance advantage, and it would allow clients to defer deciding if the statement was going to hang around and time the migration when that work wouldn't matter, eg when you decide you're going to cache a prepared statement handle for reuse instead of releasing it you also migrate it.

Richard Hipp

unread,
Jun 1, 2017, 6:51:24 AM6/1/17
to sqlit...@mailinglists.sqlite.org
On 6/1/17, Darko Volaric <li...@darko.org> wrote:
> As an alternative to this new call, would it be possible to have a "migrate"
> call that copies the prepared statement from lookaside to malloc'd memory?

Not easily. A typical prepared statement will be about a dozen
separate memory allocations. Most of those allocations are for things
like a string or a label and would be easy to migrate. But the two
main allocations both contain multiple objects. There are many
pointers between objects, both betwen objects within the same
allocation and between objects in other allocations. Untangling that
hive of pointers would be a real challenge.

Prepared statements must be "reprepared" (essentially reconstructed
from scratch) after every schema change. This happens automatically.
I thought about automatically migrating prepared statements from
lookaside to malloc memory when they are reprepared, based on their
past usage patterns. And we still might do that in the future. But I
think it is also important that applications have the ability to
specify that prepared statements are "long term" in advance.

> This would have a couple of advantages: it would preserve the 25%
> performance advantage, and it would allow clients to defer deciding if the
> statement was going to hang around and time the migration when that work
> wouldn't matter, eg when you decide you're going to cache a prepared
> statement handle for reuse instead of releasing it you also migrate it.

I'm thinking the cost of migrating would be far greater than the 25%
performance savings from initially using lookaside.

Jay Kreibich

unread,
Jun 1, 2017, 7:36:47 AM6/1/17
to sqlit...@mailinglists.sqlite.org


Or you could even try to automate such a process, by migrating it after the Nth execution or an execution some number of seconds after the prepare.

 -j
--  
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson



Richard Hipp

unread,
Jun 1, 2017, 8:31:02 AM6/1/17
to sqlit...@mailinglists.sqlite.org
On 6/1/17, Jay Kreibich <j...@kreibi.ch> wrote:
>
>
> Or you could even try to automate such a process, by migrating it after the
> Nth execution or an execution some number of seconds after the prepare.
>

The problem there is that the prepared statement might never be run.
Some virtual tables prepare their statements in the xConnect method so
that they are away available. But if the virtual table is never used,
those statements are never run at all. It seems wrong to use up a
bunch of fast lookaside memory on a prepared statement that is never
actually used.

The automatic reprepare occurs when the a prepared statement is used.
So even if the prepared statements are marked as "migrate" before use,
they won't actually migrate until they are in fact used, which might
never happen.

Stuart Mitchell

unread,
Jun 1, 2017, 10:05:30 AM6/1/17
to sqlit...@mailinglists.sqlite.org
Starting from a completely different angle, you could modify the existing ...prepare_v2 to use malloc and add a function to specifically use the faster memory mechanism.
I suspect that most apps won't notice the difference and that the developers of those that would take the time to read the release notes.
Doing it this way means the limited lookaside resources are under the developers control and not subject to arbitrary decisions or obscure #defines.

Stuart

Richard Hipp

unread,
Jun 1, 2017, 10:35:11 AM6/1/17
to sqlit...@mailinglists.sqlite.org
On 6/1/17, Stuart Mitchell <stu...@aquarius-itech.com> wrote:
> Starting from a completely different angle, you could modify the existing
> ...prepare_v2 to use malloc and add a function to specifically use the
> faster memory mechanism.
> I suspect that most apps won't notice the difference and that the developers
> of those that would take the time to read the release notes.
> Doing it this way means the limited lookaside resources are under the
> developers control and not subject to arbitrary decisions or obscure
> #defines.
>

99.9% of developers are using SQLite through an ORM intermediary of
some kind and do not have direct access to the APIs. They are just
submitting queries and expecting them to run. And run quickly.

Long-lived prepared statements are the exceptional case, I think.
Almost all prepared statements are transient and benefit (greatly)
from using lookaside.

I think the default configuration, therefore, should be to use
lookaside. (That has been the case since version 3.6.1 in 2008.) The
rare cases when a prepared statement is held for a long time and where
lookaside memory is in short supply are the only cases that would
benefit from the new API.

We want SQLite to "just work" for most developers, with no tuning or
fussing. Alternative APIs and #defines should be restricted to a very
small minority of cases that want to do something special or unusual.
The SQLITE_PREPARE_PERSISTENT flag falls into that "special and
unusual" category, I think.

R Smith

unread,
Jun 1, 2017, 11:33:35 AM6/1/17
to sqlit...@mailinglists.sqlite.org
I think it's a great idea to use a new _prepare_V3() naming, much better
than another SQL "hint", and I understand the hesitation, but there
isn't a better way and perhaps it is time to look forward and exploit
the opportunity more fully.

Isn't there anything else that can benefit from an additional parameter
in the prepare statement? We often read here requests for tweaking the
QP or avoiding certain types of execution or hooking some special case.

I'm not saying to implement any of these right now, and I'm not privy to
the future upgrade path, but if you are expanding the interface anyway,
perhaps add a reasoned parameter or two, at least in prototype, so as to
allow foreseeable additions and defer _v4 and _v5 for longer.

Also - It's new, we don't need it to be backward compatible at all.
Perhaps this one can enforce a more STRICT syntax or drop a historic
kink that prevents a faster optimisation, or get rid of a few special
allowances that break expectation - I know that's been a constant
request theme here. (I realise this may require an internal fork in the
prepare code of sorts, but it could be minimal and doesn't need to be
implemented right away if at least now we have the future interface for it).

Lastly, if I may, a personal curiosity - How is it that this "long-term
prepare keeping a lot of look-aside to itself" optimisation opportunity
came to light? I did not see anyone here suggest it or bemoan it, and I
find it hard to see a situation where that would be the obvious next
thing to streamline. Did some testing mechanism highlight it? Some
research into another optimisation perhaps? Be interesting to know.


Thanks,
Ryan

_______________________________________________

Richard Hipp

unread,
Jun 1, 2017, 12:01:36 PM6/1/17
to sqlit...@mailinglists.sqlite.org
On 6/1/17, R Smith <rsm...@rsweb.co.za> wrote:
> How is it that this "long-term
> prepare keeping a lot of look-aside to itself" optimisation opportunity
> came to light?

I had the idea of adding the "stmts" virtual table as an extension.
(https://www.sqlite.org/src/file/ext/misc/stmts.c) That virtual table
does introspection - It returns one row for each prepared statement in
the database connection.

This suggested the idea of the SQLITE_STMTSTATUS_MEMUSED enhancement
(https://www.sqlite.org/src/info/c26cf978eead1c9d) for obtaining the
total memory usage of a single prepared statements. Previously we had
the SQLITE_DBSTATUS_STMT_USED for obtaining the total memory used by
all prepared statements, but not a way to get the memory used by a
single prepared statement.

Combining these two features allowed me to run scripts like this:

CREATE VIRTUAL TABLE t1 USING rtree(id,x,y);
INSERT INTO t1 VALUES(1,2,3),(2,4,6);
SELECT * FROM t1 WHERE x>10 AND x<20;
-- The above just creates some prepared statements inside the RTree vtab
.load ./stmts
SELECT mem, sql FROM stmts;

And that output shows each prepared statement using way more memory
than we though reasonable. Further investigation showed that the
prepared statements were using 1200-byte lookaside allocations to
store things like a 6-character column name. Wasteful. Running the
script above with lookaside disabled:

./sqlite3 --lookaside 0 0 <x1.txt

Showed much more reasonable prepared statement sizes.

The lesson: You want as much visibility into your product as
possible. Inefficiencies can be lurking in unexpected places.
Measure things. Print out the measurements. See if they make sense
or if there is a possibility to make things more efficient.

--
D. Richard Hipp
d...@sqlite.org

Roger Binns

unread,
Jun 1, 2017, 1:44:06 PM6/1/17
to sqlit...@mailinglists.sqlite.org
On 31/05/17 18:30, Richard Hipp wrote:
> Most prepared statements are created, used one or two times, then
> quickly destroyed. But sometimes a prepared statement is cached and
> kept around for a long time.

I'd like to suggest an alternate approach. I think SQLite should
implement a statement cache. Then there should be a way of getting
statistics from the caching. Various developers can then post those
statistics here which will better inform what performance improvement
opportunities would be worthwhile.

As an example the Python wrappers for SQLite all implement a statement
cache (and it does make a difference). Many wrappers for other
languages do the same. Each of those cache implementations is
duplicated code, opportunities for bugs, and will have arbitrary
limitations.

The default cache size can be zero (no caching).

Roger




signature.asc

Scott Hess

unread,
Jun 1, 2017, 4:45:21 PM6/1/17
to sqlit...@mailinglists.sqlite.org
While I can see the argument against (is that "lite"?), I like it, because I sometimes feel like developers implement their own caching in interesting (and sometimes misguided) ways.  For instance, caching things which are never executed again, or not resetting correctly when finished with the statement.  I think it would be nice to have a cache where you could say something like "Spend up to N bytes on cache", and then have it do the obvious thing.  Also, it might be nice to have any sort of metrics to help inform the cache/no-cache decision for a particular statement.  For instance, some statements are not executed frequently enough to be worth caching, other statements may not be complicated enough to warrant caching.

WRT the earlier concern about re-preparing statements which will be cached being a loss, I do wonder if there are other allocation optimizations that will occur to you (drh) if/when you have a clear signal for expected usage.  Or even performance optimizations, some analysis may make more sense for statements which have a high likelihood to be executed multiple times.

-scott

Александр Орефков

unread,
Jun 1, 2017, 5:07:01 PM6/1/17
to sqlit...@mailinglists.sqlite.org
Hi.
In my software, I widely use cached prepared statements and I would be
very glad to have such an opportunity.
But I would also like to have such an opportunity as "cloning"
prepared statements.
Now the stmt encapsulates in itself two entities - the execution
program and the context of its execution.
When several clients at the same time (for example, in different
threads or in the same thread in different fibers, corutins) perform
the same queries - for each one it is necessary to prepare a query
with the same text,
again and again performing the work on creating the execution program.
When I place the stmt in the cache - at the time of its execution, I
must remove it from the cache, and for other clients create new
instances with the same text.
I would like to "clone" the prepared stmt so as not to parse them
again and again.
Then the stmt inside itself would have its own execution context and a
reference to the shared execution program, if necessary, to execute
almost simultaneously several identical requests.
And if you change the scheme, you would have to rebuild fewer stmts.
Is it possible?
С уважением, Александр Орефков

Stuart Mitchell

unread,
Jun 1, 2017, 6:33:02 PM6/1/17
to sqlit...@mailinglists.sqlite.org
[Sorry about the nesting level of this reply. I won't be so quick to delete in the future.]

How about a compromise. A scoped (compile, start, per statement) define/variable/flag that sets lookaside/heap as default for existing v2 function and implement v3 function as first suggested for fine control. Set the default (no developer intervention) for v2 to transient so that unaware users are not affected. That would meet the needs of all responses so far.

Stuart

Scott Doctor

unread,
Jun 1, 2017, 6:48:40 PM6/1/17
to sqlit...@mailinglists.sqlite.org

What about using some new command like
begin_prepare_cache/end_prepare_cache , which tells sqlite that
the following prepared statements are to be cached for long term
storage. Doing such would allow existing code to remain as-is
with the addition of a start/end wrapped around the desired
prepare commands. Would not need any modifications to the
prepare command but would allow the programmer to select which
statements get special treatment.

-------------------------
Scott Doctor
-------------------------

Philip Warner

unread,
Jun 1, 2017, 8:20:47 PM6/1/17
to sqlit...@mailinglists.sqlite.org, Richard Hipp
One of the things I would like to see in a more broad sense is if as many as
possible options and settings were available via SQL; eg. in this case an SQL
function call on a connection that set all future statements to use
lookaside/malloc etc.

As you note, most devs have a layer between them and the C API. I work mostly in
Android, and the Android layer caches prepared statements and my apps prepare
very long lives statements. I imagine there will be other intermediaries and
apps that do the exact opposite.

Bottom line: whatever changes are made should be available to as many devs as
possible, and the 99.9% assertion means it needs to be via SQL as well. The
#defines should really only be about setting the default.


On 2/06/2017 12:35 AM, Richard Hipp wrote:
> 99.9% of developers are using SQLite through an ORM intermediary of
> some kind and do not have direct access to the APIs. They are just
> submitting queries and expecting them to run. And run quickly.

_______________________________________________

Howard Kapustein

unread,
Jun 1, 2017, 8:55:44 PM6/1/17
to sqlit...@mailinglists.sqlite.org
>Most prepared statements are created, used one or two times, then quickly destroyed

Then I'm the exception proving the rule. I routinely have 'persistent' prepared statements - we prepare everything and as the cost is significant we wind up hanging onto connections each with its list of prepared statements for extended periods of time (potentially life of process, which can span boot-to-shutdown)



>Alternative APIs and #defines should be restricted to a very
>small minority of cases that want to do something special or unusual.
>The SQLITE_PREPARE_PERSISTENT flag falls into that "special and
>unusual" category, I think.
+1
Common cases should be easy to get right and hard to get wrong. It's hard to be "it just works" :-) One of SQLite's strength's is it works well out of the box w/little fiddling. I'll happily take extra steps to maximize my gains


>great idea to use a new _prepare_V3() naming
+1


>I'd like to suggest an alternate approach. I think SQLite should implement a statement cache
+1
I've got my own statement cache because I had to, but I'd have loved not having needed to go there in the first place. I won't object to scrapping my cache-of-necessity if SQLite handles it for me

And if not, I can PRAGMA statement_cache_size=0; and do my own thing. If the statement cache can be #ifdef compile'd out then win-win :-)


>What about using some new command like begin_prepare_cache/end_prepare_cache
-1
Contextual operations are a mixed blessing. Handy when you need to span a set of work, but they tend to ratchet up complexity and (unintended) coupling. It's easier to add atomic functionality and then weave higher level conveniences than to start with higher level compound behavior and try to decouple it later. Better to do sqlite3_prepare_v3 first, then see if/how some sort of contextual or 'mode' support has value

- Howard

Howard Kapustein

unread,
Jun 1, 2017, 9:03:51 PM6/1/17
to sqlit...@mailinglists.sqlite.org
Typo: It's hard to *BEAT* "it just works" :-)

Richard Hipp

unread,
Jun 1, 2017, 10:04:49 PM6/1/17
to sqlit...@mailinglists.sqlite.org
On 6/1/17, Scott Doctor <sc...@scottdoctor.com> wrote:
>
> What about using some new command like
> begin_prepare_cache/end_prepare_cache , which tells sqlite that
> the following prepared statements are to be cached for long term
> storage. Doing such would allow existing code to remain as-is
> with the addition of a start/end wrapped around the desired
> prepare commands. Would not need any modifications to the
> prepare command but would allow the programmer to select which
> statements get special treatment.

(1) There is a race condition when you have two or more threads using
the same database connection.

(2) How is adding calls to begin_prepare_cache() and
end_prepare_cache() any easier or less invasive than changing a "v2"
into "v3" and then adding an extra SQLITE_PREPARE_PRESISTENT
parameter?

Remember, nobody is required to change anything. After the addition
of sqlite3_prepare_v3() (if it goes through) everything will continue
to work exactly as it did before, and be just as performant as it was
before. But afterwards, programmers have the option to improve the
performance in some cases, should the choose to.


--
D. Richard Hipp
d...@sqlite.org

Roger Binns

unread,
Jun 2, 2017, 12:56:34 PM6/2/17
to sqlit...@mailinglists.sqlite.org
On 01/06/17 13:45, Scott Hess wrote:
> because I sometimes feel like developers implement their own caching in
> interesting (and sometimes misguided) ways

There are also arbitrary limitations. For example in the cache I
implemented in APSW, there is a one to one mapping of sql text (key) to
a prepared statement (value). If a second copy of the same sql text is
executed then the cache is bypassed. A more complex cache data
structure could address that, but is significantly more code (and
testing etc).

Separately I am also *very* skeptical developers will be able to
correctly tag queries in advance. The most likely scenarios are not
tagging at all, or a cargo cult tag everything because some random web
page says that is how to improve performance.

I think SQLite can do a lot better job internally and automatically.
For example a simple heuristic is to checksum the sql text, and if the
same checksum occurs in the future then automatically consider that sql
worth caching.

Roger

signature.asc

Sylvain Pointeau

unread,
Jun 19, 2017, 3:53:38 AM6/19/17
to sqlit...@mailinglists.sqlite.org
On Fri, Jun 2, 2017 at 6:56 PM, Roger Binns <rog...@rogerbinns.com> wrote:

I think SQLite can do a lot better job internally and automatically.
For example a simple heuristic is to checksum the sql text, and if the
same checksum occurs in the future then automatically consider that sql
worth caching.


I am using SQLite through JDBC using the Christian Werner's driver.
I don't think we can use the prepare_v3 (vs prepare_v2) since we cannot differentiate (in the JDBC API) if this is only one time or reused.

So I agree with Roger, I also think SQLite can do a lot better job doing it internally and automatically.



Reply all
Reply to author
Forward
0 new messages