This link:
http://www.postgresql.org/docs/8.3/static/sql-prepare.html
has a note on performance:
In some situations, the query plan produced for a prepared statement
will be inferior to the query plan that would have been chosen if the
statement had been submitted and executed normally. This is because when
the statement is planned and the planner attempts to determine the
optimal query plan, the actual values of any parameters specified in the
statement are unavailable. PostgreSQL collects statistics on the
distribution of data in the table, and can use constant values in a
statement to make guesses about the likely result of executing the
statement. Since this data is unavailable when planning prepared
statements with parameters, the chosen plan might be suboptimal.
I don't understand why postgres couldn't plan this:
SELECT foo.e, foo.f
FROM foo
WHERE pk=$1 AND b=$2 AND status='1' AND c <= $3;
to be later executed any slower than
SELECT foo.e, foo.f
FROM foo
WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';
Can I help it make more educated guesses? In what scenarios could
prepared statements turn around and bite me, being slower than simple
queries? Is this a real problem in practice? Should I "refresh" prepared
statements from time to time? If so, how? Only by deallocating them and
preparing anew? Any knob to tweak for that?
Okay, enough questions :)
Thank you for any insights.
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
One solution is that the planner will work when it will see the query.
At that time the comparisons are all against unknown values.
Try the same with dyamical SQL, that is you dynamically build by placing the
current values instead of the $1, $2 and $3 placeholders.
In this case the planner will see the query with all current real values.
Then you execute it and compare the timings. Maybe you get better scores:
there's no warranty for better performances becasue you are going to send
the whole query again and again to the planner.
Of course you need a plpgsql function for this.
--
Reg me Please
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly
From http://www.postgresql.org/docs/8.3/static/protocol-flow.html I
just read that "This possible penalty is avoided when using the unnamed
statement, since it is not planned until actual parameter values are
available."
Since I'm using Perl's DBI/pg, in postmaster's logs I can see that DBI's
prepare() seems to using named prepared statements:
Nov 7 15:57:46 sol postgres[1685]: [2-1] LOG: execute dbdpg_1:
Nov 7 15:57:46 sol postgres[1685]: [2-2] SELECT
...
is there any way to tell it to use unnamed prepared statements? I
understand this is not a strictly PostgreSQL question so sorry if I'm
off the topic.
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
For example, if the table contains almost no rows in the
beginning, the planner will choose to use a full table schan
even if - say - 'pk' is the primary key.
If you use the same execution plan later when the table is big,
the full table scan will hurt considerably, and you would
be much better of with an index lookup.
Other scenarios are certainly conceivable, but this one is
easy to understand.
> Can I help it make more educated guesses? In what scenarios could
> prepared statements turn around and bite me, being slower than simple
> queries? Is this a real problem in practice? Should I
> "refresh" prepared statements from time to time? If so, how? Only by
> deallocating them and preparing anew? Any knob to tweak for that?
You'll probably have to deallocate them and allocate them anew.
Yours,
Laurenz Albe
> to be later executed any slower than
> SELECT foo.e, foo.f
> FROM foo
> WHERE pk='abcabc' AND b='13' AND status='1' AND c <= '2007-11-20 13:14:15';
The reason is that without knowing the parameter values, the planner has
to pick a "generic" plan that will hopefully not be too awful regardless
of what the actual values end up being. When it has the actual values
it can make much tighter estimates of the number of matching rows, and
possibly choose a much better but special-purpose plan. As an example,
if the available indexes are on b and c then the best query plan for the
first case is probably bitmap indexscan on b. But in the second case,
the planner might be able to determine (by consulting the ANALYZE stats)
that there are many rows matching b='13' but very few rows with c <=
'2007-11-20 13:14:15', so for those specific parameter values an
indexscan on c would be better. It would be folly to choose that as the
generic plan, though, since on the average a one-sided inequality on c
could be expected to not be very selective at all.
regards, tom lane
Same question for any number of joins where bar.id or baz.id is always
aPK:
select ... from foo JOIN bar ON(foo.bar_id=bar.id) JOIN baz
ON(foo.baz_id=baz.id) WHERE asd=? AND dsa=?;
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Yeah, if there's always a PK equality constraint then the dependence on
specific parameter values is much weaker, so you could probably use a
prepared statement without worrying. The cases where prepared
statements tend to suck usually involve either inequalities, or
equalities on non-unique columns where the number of matches varies
wildly for different data values. In cases like that, knowing the exact
value being compared to makes a very large difference in the rowcount
estimate.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster